Volume 77, Issue 2 p. 307-321
Full Access

Pupils' over-reliance on linearity: A scholastic effect?

Wim Van Dooren

Corresponding Author

Wim Van Dooren

Center for Instructional Psychology and Technology, University of Leuven, Belgium

Correspondence should be addressed to Wim Van Dooren, Center for Instructional Psychology and Technology, Vesaliusstraat 2, B-3000 Leuven, Belgium (e-mail: [email protected]).Search for more papers by this author
Dirk De Bock

Dirk De Bock

Center for Instructional Psychology and Technology, University of Leuven, Belgium

EHSAL – European University College Brussels, Belgium

Search for more papers by this author
Dirk Janssens

Dirk Janssens

Department of Mathematics, University of Leuven, Belgium

Search for more papers by this author
Lieven Verschaffel

Lieven Verschaffel

Center for Instructional Psychology and Technology, University of Leuven, Belgium

Search for more papers by this author
First published: 24 December 2010
Citations: 21

Abstract

Background. From upper elementary education on, children develop a tendency to over-use linearity. Particularly, it is found that many pupils assume that if a figure enlarges k times, the area enlarges k times too. However, most research was conducted with traditional, school-like word problems.

Aims. This study examines whether pupils also over-use linearity if non-linear problems are embedded in meaningful, authentic performance tasks instead of traditional, school-like word problems, and whether this experience influences later behaviour.

Sample. Ninety-three sixth graders from two primary schools in Flanders, Belgium.

Method. Pupils received a pre-test with traditional word problems. Those who made a linear error on the non-linear area problem were subjected to individual interviews. They received one new non-linear problem, in the S-condition (again a traditional, scholastic word problem), D-condition (the same word problem with a drawing) or P-condition (a meaningful performance-based task). Shortly afterwards, pupils received a post-test, containing again a non-linear word problem.

Results. Most pupils from the S-condition displayed linear reasoning during the interview. Offering drawings (D-condition) had a positive effect, but presenting the problem as a performance task (P-condition) was more beneficial. Linear reasoning was nearly absent in the P-condition. Remarkably, at the post-test, most pupils from all three groups again applied linear strategies.

Conclusions. Pupils' over-reliance on linearity seems partly elicited by the school-like word problem format of test items. Pupils perform much better if non-linear problems are offered as performance tasks. However, a single experience does not change performances on a comparable word problem test afterwards.

Owing to its wide applicability in mathematical, scientific and everyday-life problems, linearity is a key concept throughout primary and secondary mathematics education. Inherent in the attention paid to this concept, however, is the risk that pupils develop an over-reliance on linearity, and also display linear reasoning in situations where another mathematical model is present. This tendency seems present at various ages (from second graders to adults) in many mathematical subdomains. For instance, Van Dooren, De Bock, Hessels, Janssens, and Verschaffel (2005) offered sixth graders missing-value problems such as ‘Ellen and Kim are running around a track. They run equally fast but Ellen started later. When Ellen has run 5 rounds, Kim has run 15 rounds. When Ellen has run 30 rounds, how many rounds has Kim run?’ They found that more than half of the sixth grade pupils applied a linear method (e.g. 5×3 = 15, so 30×3 = 90) instead of an additive one (5+10 = 15, so 30+10 = 40). The over-reliance on linearity has also been observed in problem situations in algebra (Stacey, 1989), calculus (Esteley, Villarreal, & Alagia, 2004) and probability (Van Dooren, De Bock, Depaepe, Janssens, & Verschaffel, 2003).

This paper focuses on the most extensively studied case of pupils' over-reliance on linearity, namely in the domain of geometry: many pupils – even prospective teachers and engineers – assume that when a figure is enlarged or reduced k times, the area and/or volume will enlarge or reduce k times too, while in fact the area and volume enlarge k2 times and k3 times, respectively (De Bock, Verschaffel, & Janssens, 1998; Modestou, Gagatsis, & Pitta-Pantazi, 2004; Mogensen, 2004; Simon & Blume, 1994; Tierney, Boyd, & Davis, 1990). Empirical studies (De Bock et al., 1998; Modestou et al., 2004) showed that more than 80% of 12- to 16-year-olds inappropriately answered word problems such as ‘Farmer Carl needs 6 bags of grass seed to cover a square pasture with a side of 200 metres. How many bags of grass seed will he need to cover a square pasture with a side of 600 metres?’ (giving the answer 3×6 = 18 bags instead of 32×6 = 54 bags). Even with considerable support (such as providing drawings, instructing to make drawings, or giving meta-cognitive hints), the vast majority of pupils in these studies continued to display linear reasoning. De Bock, Van Dooren, Janssens, and Verschaffel (2002) revealed that the error was due to a complex interaction of factors: (a) the intuitiveness of the linear model; (b) certain shortcomings in pupils' geometrical knowledge; (c) a poor use of heuristics; and (d) inadaptive habits and beliefs towards mathematical word problem solving.

These last two explanatory factors – i.e. pupils' poor use of heuristics and their attitudes and beliefs towards word problem solving – seem to be related to the kind of tasks that were applied. In the previously mentioned studies (De Bock et al., 1998; Modestou et al., 2004), tests containing traditional, rather inauthentic word problems were taken in a classical scholastic context. Such tests may trigger in pupils a set of implicit rules and expectations established by the socio-mathematical norms of the classroom setting (see, e.g. Boaler, 1993; Cobb, Yackel, & McClain, 2000; Cooper & Harries, 2002; Lave, 1988, 1992; Palm, 2002; Reusser, 1988; Verschaffel, Greer, & De Corte, 2000; Wyndhamn & Säljö, 1997). So it could be argued that pupils who over-relied on linearity in the above-mentioned studies did not invest sufficient mental effort and did not activate potentially helpful (meta)cognitive strategies in tackling the problems – assuming that they were dealing with routine word problems. They may, moreover, have excluded some considerations (e.g. that the obtained solution should look acceptable compared with the real-world situation that it refers to) and solution strategies (such as checking the viability of a solution by making a sketch of the problem situation) – assuming that they were not required or even not acceptable in that context (Palm, 2002). Empirical evidence shows that, for various kinds of modelling problems, pupils are more inclined to restrain from their stereotyped problem-solving behaviour and to include essential particularities of the real-world situation in their solutions when problems are disentangled from their scholastic chains and embedded in more meaningful, authentic tasks (e.g. DeFranco & Curcio, 1997; Lave, 1988; Nunes, Schliemann, & Carraher, 1993; Palm, 2002; Reusser & Stebler, 1997; for an overview of such studies, see Verschaffel et al., 2000). The term ‘authenticity’ has no single meaning, but in general, authors use it to refer to tasks that are encountered in real, out-of-school situations (or to truthful, high-fidelity simulations thereof in a school context), and they contrast it with ‘inauthentic’ mathematical tasks, which are the low-fidelity simulations typically encountered in school settings (Palm, 2002). Several authors (e.g. Cooper, 1992, 1994; Lave, 1988, 1992; Lesh, 1992; Palm, 2002; Verschaffel et al., 2000) mention essential differences between the two kinds of tasks. They are summarized in Table 1 (for an elaborate discussion, see Palm, 2002). The more a task resembles the right column of Table 1, the more authentic it can be called.

Table 1. Characteristics of authenticity and inauthenticity in mathematical tasks
image

There are already studies that indicate a beneficial effect of increasing the task's authenticity on breaking pupils' tendency to over-use linear methods. Reusser and Stebler (1997) used items from studies by Greer (1993) and Verschaffel, De Corte, and Lasure (1994), for which upper elementary pupils tend to give stereotyped, unrealistic answers. Among these, there were items for which pupils over-used linearity:

  • ‘John's best time to run 100 metres is 17 seconds. How long will it take him to run 1 kilometre?’ (erroneous linear answer: 10×17 = 170 seconds).

  • ‘A flask is being filled from a tap at a constant rate. If the depth of the water is 4 cm after 10 seconds, how deep will it be after 30 seconds?’ (with a picture of a cone-shaped flask shown) (erroneous linear answer: 3×4 = 12 cm).

  • ‘A man wants to have a rope long enough to stretch between two poles 12 metres apart, but he only has pieces of rope 1.5 metres long. How many of these pieces would he need to tie together to stretch between the poles?’ (erroneous linear answer: 12/1.5 = 8 pieces).

Reusser and Stebler (1997) gave these problems first as part of a typical scholastic paper-and-pencil test, and the day after in a performance setting with concrete materials (e.g. pieces of rope, scissors and a metre stick for the rope problem or a cone-shaped flask and a jug of water for the flask problem) and a clear performance instruction (namely: investigate the problem using the materials, make a prediction about the answer, execute the task and write down your final answer). Pupils were remarkably less inclined to make the linear error: compared with the paper-and-pencil condition, the percentage of correct answers increased from 12 to 47% for the runner problem, from 7 to 40% for the flask problem and from 18 to 62% for the rope problem.

The goal of the current study was to investigate whether such an approach would also be effective to break pupils' linear reasoning tendency on problems about the effect of the enlargement of a figure on its area.1 We emphasize that we did not simply seek for yet another confirmation of the beneficial effect of authentic task settings for a new kind of non-linear situation. The over-reliance on linearity in the domain of length–area problems is shown to be very general and extremely resistant to different, even strong kinds of help (De Bock et al., 1998, 2002; Modestou et al., 2004). We hoped to gain a deeper understanding of its resistance and its underlying roots by investigating pupils' reactions in a different experimental context. An important difference with the Reusser and Stebler (1997) study is moreover that in their study, pupils could rely on the linear model but should interpret the outcome of their linear calculations by considering the original problem context – which may indeed be facilitated by more authentic tasks (Palm, 2002). For the length–area problems in our study, however, the linear model is simply not valid, and a considerably more complex mathematical model needs to be applied instead. A final contrast with previous research on the authenticity effect is that we wanted to investigate whether the experience of correctly solving an authentic non-linear problem would have a beneficial effect on pupils' solutions on traditional school word problems on this topic as well.

Actually, a study by De Bock, Verschaffel, Janssens, Van Dooren, and Claes (2003) had a comparable aim: to investigate the effect of task authenticity on pupils' over-reliance on linearity for problems on lengths, areas and volumes of geometrical figures. De Bock et al. did this by prefacing a paper-and-pencil test by an assembly of video fragments telling the story of Gulliver's visit to the isle of the Lilliputians – where all lengths are 12 times smaller – and linking test items directly to these fragments. For instance, pupils in the ‘authentic’ condition were given problems such as ‘Gulliver's handkerchief has an area of 1,296 cm2. What is the area of a Lilliputian handkerchief?’, whereas control condition pupils received parallel versions formulated as traditional school problems not embedded in contextually meaningful settings: ‘The side of square Q is 12 times as large as the side of square R. If the area of square Q is 1,296 cm2, what's the area of square R ?’ Contrary to De Bock et al. 's expectations, pupils in the ‘authentic’ condition did not outperform control condition pupils. They afterwards admitted that watching a video and completing a paper-and-pencil test about that video was a very weak operationalization of authenticity, and they argued that ‘probably, a more performance-based form of assessment (…) is needed’ (De Bock et al., 2003, p. 248).

In line with their suggestion, we wanted to observe the effect of a more radical operationalization of authenticity. When pupils are confronted with a genuinely meaningful, performance-based mathematical task, do they approach it differently compared with when they tackle a traditional school word problem, and does this positively affect their performances when tackling non-linear geometry problems? Also, is there an effect on pupils' word problem-solving behaviour afterwards?

Method

Participants

The study was conducted with 93 sixth graders, i.e. pupils aged between 11 and 12 (five class groups from two medium-sized elementary schools in Flanders, attracting average pupil populations). In the fourth and fifth grades, these pupils had already encountered the concepts of area and volume and learnt how to calculate areas and volumes of various basic geometrical figures. So, in these years, they were taught – by their regular teachers – the specific mathematical content knowledge and skills needed for solving all the non-linear area problems used in the study.

The study was conducted in three steps. First, a pre-test was administered to all pupils. Next, pupils who made the envisaged linear error on the pre-test were involved in an individual interview. Third, all participants were presented with the same post-test. Each step is explained in more detail below.

Pre-test

A paper-and-pencil pre-test with six word problems was taken from pupils in their regular classroom by a member of the research team. Five of the word problems in the pre-test were buffer items, intended to distract pupils from the actual focus of the study. One was the experimental item, aimed at detecting whether pupils tended to give a linear answer to problems about the effect of an enlargement on the area of a square.

Two versions of the test were constructed. Half of the pupils received version A, half received version B. At the post-test (see below), each pupil received the other version. The experimental items in test versions A and B were problems about the area of a square of which the sides were doubled:

  • Version A: ‘John needs 15 minutes to paint a square ceiling with a side of 3 metres. Approximately how much time will he need to paint a square ceiling with a side of 6 metres?’

  • Version B: ‘Carl needs 8 hours to manure a square field with a side of 200 metres. Approximately how much time will he need to manure a square field with a side of 400 metres?’

Both items were considered equivalent because they dealt with mathematically comparable situations and problem formulations were kept as similar as possible. It can, moreover, be assumed that for sixth graders, the numbers in both versions cause no differences in calculation difficulty. In a previous study by Van Dooren, De Bock, Hessels, Janssens, and Verschaffel (2004), both items were administered simultaneously, and Cronbach's α was .81.

Interview procedure

Two or three days after the pre-test, the pupils who made a linear error on the experimental pre-test item were individually taken out of the classroom for an in-depth interview. These pupils were given one non-linear problem, this time about the effect of tripling the lengths of the sides of a square on its area. The problem was offered in one of three different ways, depending on the experimental condition that the pupil was assigned to. Assigning pupils to conditions was carried out by matching, based on pupils' mathematics performances on formative school assessments during the previous months.

Pupils in the S-condition (‘Scholastic’ condition) received a sheet with the problem formulated as a traditional word problem, presented similarly as the non-linear word problem in the pre-test and post-test:

Recently, I made a doll's house for my sister. One of the rooms had a square floor with sides of 12 cm. I needed 4 square tiles to cover it. Another floor of the doll's house was also a square, but with sides of 36 cm. How many of those square tiles did I need to cover it?

The D-condition (‘Drawing’ condition) was exactly the same as the S-condition, but this time the sheet also contained a drawing to scale of the small and large figure, as shown in Figure 1.

Details are in the caption following the image

Drawing offered in the D-condition.

Both in the S- and D-condition, the answer that the pupil wrote down as his/her final answer was registered and analyzed.

In the P-condition (‘Performance task’ condition), pupils were involved in the actual problem situation with real materials (the small doll's house floor, 4 tiles and the large doll's house floor) and were asked to perform an authentic action. The task presentation was standardized as follows:

I have a little sister, and currently I am making a doll's house for her. Here, you can see the floor of one of the rooms. Can you tell me its shape? [The pupil tells that it is a square.] Let's measure it. [The pupil observes that the sides are 12 cm long.] I have some tiles that we can use to cover that floor. Can you do that? [Pupil puts 4 tiles on the small floor.] Indeed, we need 4 tiles to cover that floor.

I also brought another floor of the doll's house. As you can see, it is also square. Let's measure it as well. [Pupil observes that the sides are 36 cm.] In a few moments, we are going to put tiles on this large floor as well. Now, think about how many tiles we will need to do that. If you have decided you can go to the table over there and fetch exactly enough tiles.

In the P-condition, the number of tiles that the pupils actually fetched was registered as the final answer. At the end of the interview, P-condition pupils were allowed to put the tiles effectively on the large floor (but note that their final answer was already registered before this happened). Unavoidably, this revealed whether their answer was correct or not. When pupils observed that they did not bring enough tiles, they could fetch additional tiles to cover the entire floor.

It is clear that the task settings in the S- and D-conditions bear a strong resemblance to the inauthentic task characteristics (left column of Table 1), whereas the P-condition corresponds more to an authentic task setting. Note that for our research goal, strictly speaking, the experimental design would only need to include a S-condition and a P-condition. Nevertheless, a D-condition was included for the following reason: pupils in the P-condition not only received the non-linear problem as a performance task instead of a school-like word problem; the problem presentation in the P-condition also gave additional visual support while this was not present in the S-condition. By including a D-condition – which provided equivalent visual support as the P-condition but which kept the scholastic setting from the S-condition – we could appropriately control for this factor.

At the beginning of the interview, pupils were told that they could tackle the problem in whatever way they wanted and use all materials available (pencil, paper, ruler, pocket calculator, in the D-condition also the drawings, and in the P-condition the small and large floor and the four available tiles). Pupils were asked to think aloud while solving the problem (Ginsburg, Kossan, Schwartz, & Swanson, 1982). When they immediately came up with an answer without verbalizing their thinking, the interviewer asked some standardized probing questions (‘Where does that answer come from?’, ‘Can you explain how you found that answer?’). These questions were posed only after the final answer was registered, to ensure that these probing questions would not affect pupils' answers.

At the end of the interview pupils were also asked to indicate on a five-point scale how certain they were about the correctness of their answer (‘certainly wrong’, ‘probably wrong’, ‘no idea’, ‘probably correct’, and ‘certainly correct’) and to justify this. Of course, in the P-condition this happened before pupils were allowed to put their tiles on the large floor. Pupils' scores on this certainty rating scale could already reveal interesting differences between the interview conditions, but we were especially interested in the kind of justification that pupils would provide for their ratings, and in the sources that they would refer to. Here, Miller's (1986) distinction between mere certainty and certainty accompanied by feelings of necessity could be relevant: in some cases, pupils can be certain about something because they see it as a logically necessary truth (e.g. ‘All black cats are black’), but in other cases they may be equally certain without this feeling of necessity (e.g. ‘Paris is the capital of France’). In our study, the justifications could reveal whether pupils who are certain about their answer see their answer as necessarily true or not, and whether this differs between the interview conditions.

Post-test

Either 1 or 2 days after their interview, the pupils received a post-test. This test was given to all pupils who participated in the pre-test rather than only to the interviewed pupils, to avoid pupils becoming too suspicious about why they were (or were not) tested. Only the results of the pupils who were involved in the actual study (i.e. the ones who made a linear error on the pre-test and who were individually interviewed) are reported in this paper. The post-test again was taken in pupils' regular classroom under supervision of the researcher. As mentioned before, pupils who were given test version A at the pre-test received version B at the post-test and vice versa.

Results

Pre-test

Altogether, 72 of the 93 pupils gave a linear answer to the experimental item at the pre-test. There were no performance differences between the five class groups or between the boys and girls in the sample, and also not between pupils with higher or lower overall math scores. All pupils making a linear error were assigned to one of the three interview conditions (N = 24 in each condition).

Individual interviews

Table 2 provides a summary of the final answers and response times in the three interview conditions. As shown in the table, there was a statistically significant impact of the interview condition on pupils' answers (Fisher exact test, p <.00015).

Table 2. Overview of answers and average response times (in seconds) in each interview condition
image

In the S-condition, nearly all pupils (21 of 24) committed again a linear error on the interview item. This confirms pupils' deep-rooted tendency to apply linear methods when tackling problems about areas of enlarged figures, as observed in other studies (De Bock et al., 1998, 2002; Modestou et al., 2004). Two pupils committed another error and one pupil found the correct solution (in contrast with the pre-test, this pupil now made a drawing himself, which apparently helped him to find the correct solution). In the D-condition, performances were considerably better (a Fisher exact test yielded p <.00015 for the contrast between the S- and D-conditions): 16 of 24 pupils found the correct answer during the interview. We were surprised by this result, since studies by De Bock et al. (1998, 2002) had shown that providing drawings hardly affects performance (because pupils often neglect them). In the current study, pupils actually used the drawing more often and, therefore, their solution process clearly benefited from it. Possibly, pupils in our study felt more obliged to do so because they were involved in an individual interview context, whereas the studies of De Bock et al. were conducted with collective, written tests. Despite the beneficial impact of drawings for 16 pupils, it should be noted that 8 of 24 D-condition pupils gave a linear answer. The P-condition, wherein the non-linear problem was presented as an authentic performance task, yielded the best performance: 20 pupils gave the correct answer, and only 2 pupils displayed linear reasoning (2 pupils made another error). A contrast analysis of the D- and P-condition answers revealed that this difference was also statistically significant (Fisher exact test, p = .0412). In sum, almost all pupils in the S-condition made a linear error as they had done at the pre-test. Providing a drawing had a positive effect, but still one-third of the D-condition pupils made the linear error. Offering the problem as an authentic performance-based task was even more beneficial. In the P-condition, linear errors were almost absent.

Given that not only the P-condition but also the D-condition yielded a relatively large number of correct solutions (20 and 16 out of 24 pupils, respectively, vs. only 1 out of 24 in the S-condition), we also performed a more fine-grained comparative analysis of these correct solutions in these two groups with a view to determining possible differences in the way the correct solutions had originated and in pupils' feelings of certainty of the correctness of the answer. Before answering these questions, we emphasize that only few pupils were able to verbalize clearly their problem-solving processes. Therefore, this comparison is necessarily mainly based on the other data we were able to collect and analyze about pupils' problem-solving process, namely response times (cf. Table 2), observable use of the drawings or materials, and certainty scores (see the overview in Table 3) with accompanying justifications. However, before presenting these contrastive data, we illustrate, in Table 4, the major differences between the correct solution processes in the two conditions by means of two typical interview protocols.

Table 3. Distribution of ‘certainty’ scores given by correctly answering pupils in the D- and P-conditions
image
Table 4. Illustrative thinking aloud protocols of a student in the D- and P-conditions (both finding the correct answer)
image

In the D-condition, all 16 pupils who found the correct answer required a relatively long time (on average 139 seconds, cf. Table 2), and had to rely extensively on the drawing. All of them used the drawing in some way (mostly by drawing the tiles on the large floor drawing as well, or by dividing the large floor in small ones), but this idea often came up rather late in the problem-solving process, after a long time of thinking and (re)reading the problem. Five pupils initially applied a linear method, and abandoned it only after they felt that they needed to ‘do something with the drawing too’. Once given the correct answer, many pupils were still not totally convinced about its correctness (cf. Table 3). When referring to sources of certainty, pupils from the D-group (even those indicating ‘certainly correct’) never expressed feelings that the answer was necessarily correct. Justifications rather related to (the strength of) their own mathematical abilities, to a feeling of ‘having conducted the required calculations’, and/or to the fact that a recalculation had confirmed the correctness of the first calculation. When referring to sources of uncertainty, pupils indicated that various reading, interpretation or calculation errors might have occurred, or that they might have overlooked a critical aspect of the problem situation.

The above observations from the D-condition solution are in contrast with the processes observed in the P-condition. In that condition, pupils needed much less time to respond correctly: on average only 76 seconds (cf. Table 2). In most cases, they immediately and spontaneously started manipulating the materials, working towards the correct solution (figuring out rather quickly that 6 × 6 tiles fit on the large floor, or that the small floor fits 3 × 3 times on it). Also, none of them initially applied a linear method. Three pupils from the P-condition gave the correct answer immediately once the problem situation was explained to them (while this never happened in the D-condition): These pupils just ‘saw’ the correct solution at a glimpse. Once they had given the answer, pupils in the P-condition were also more convinced of its correctness than the pupils from the D-condition (cf. Table 3). When justifying their certainty rating, they did not refer to just having done the required calculations or to their own mathematical abilities (as many students in the D-condition did). Rather, they used expressions stating that the answer was necessarily correct, arguing that the solution was logical or evident whereas any alternative was just unthinkable, or claiming that they could just see the correctness of the answer.

Post-test

Table 5 shows the answers on the non-linear post-test item for the different interview conditions and interview responses separately. The post-test aimed at determining whether pupils who had given the correct answer to the non-linear item during the interview, would do this on the post-test (taken 1 or 2 days later) as well. Interestingly, this occurred only rarely. The single S-condition pupil who found the correct solution during the interview answered the post-test problem correctly too, again by making a drawing. However, in the other two conditions, almost no effect of the interview experience was found. While 16 of the 24 D-condition pupils profited from the drawings to find the correct answer in the individual interview, all of them again displayed linear reasoning on the post-test, and whereas 20 of the 24 pupils found the correct answer in the P-condition interview, only 2 of these 20 pupils answered the post-test item correctly (and all others, except 1, again made a linear error).

Table 5. Overview of pupils' answers at the post-test item in relation to their answer during the interview
image

Finally, of the 4 P-group pupils who failed to find the correct answer to the problem by themselves during the interview (appearing in the bottom two lines of Table 5), but who were allowed to put their tiles on the floor at the end of the interview, only 1 pupil gave a correct answer to the non-linear post-test item; the other 3 gave the same answer on the post-test as during the interview.

Discussion

Recently, there have been several reports that pupils over-rely on linearity to tackle problems about the area and/or volume of enlarged figures (e.g. De Bock et al., 1998, 2002, 2003; Modestou et al., 2004). They observed this tendency by means of tests containing traditional, scholastic word problems. Our study has shown that the choice of this method has an important impact on pupils' solution behaviour. When pupils who made a linear error on a pre-test were involved in an interview with a more authentic performance-based task, they were considerably less likely to overgeneralize the linear model when approaching that task, compared with pupils who received a traditional word problem or even a word problem with drawings. Our findings support Boaler's (2000, p. 118) argument for a consideration of the ‘macro context’ in which research is conducted, with a view to ‘extending our focus beyond the concepts and procedures that pupils learn to the practices in which they engage as they are learning them and the mediation of cognitive forms by the environments in which they are produced’. When pupils approach traditional word problems, we can indeed observe a clear and deep-rooted tendency to over-rely on linearity, but such a method arguably reveals more about the specific kind of ‘practices’ (Boaler, 2000; Lave, 1988) that pupils engage in when they are in a traditional school word problem-solving context, than it tells about their actual capacities for mathematical reasoning: ‘The activity of solving word problems and the contents of word problems in school are not the same as “the same” activity or contents embedded in other systems of activity in other parts of life’ (Lave, 1992, p. 89).

In general terms, our results are compatible with the conclusion of other studies. Increasing the authenticity of a task positively affects pupils' responses to that task (e.g. Cooper & Harries, 2002; DeFranco & Curcio, 1997; Lave, 1988; Nunes et al., 1993; Palm, 2002; Reusser & Stebler, 1997; Verschaffel et al., 2000). More specifically, our study showed – as was already suggested by Reusser and Stebler's study described above – that pupils' tendency to over-use linear methods declines when problems are disentangled from their scholastic chains and are embedded in a meaningful task setting. However, our study went further than just providing additional empirical evidence for this general claim, in two important senses.

First, Reusser and Stebler (1997) focused on whether pupils would consider the outcome of their calculations within the context of the original problem situation. For the runner item, for example, pupils should be aware that their linear calculations (‘It takes 10 × 17 seconds to run 10 × 100 metres’) could serve only as an approximation while the real answer would be somewhat higher. Or for the ropes item, the pupils should realize that linear calculations were indeed relevant (8 pieces of rope each being 1.5 metres long indeed have a total length of 12 metres) but one will need one or maybe two additional pieces of rope to compensate for the knots. In our study, however, we showed that a meaningful, authentic task setting also enables pupils to avoid the use of the linear model in a situation where it is simply not valid (not even as a tentative approximation), and to select or develop a considerably more complex mathematical model instead.

Second, our findings indicate that the effect of offering a meaningful, performance-based task was only a temporary and context-specific one with only marginal impact on pupils' future problem-solving behaviour. At a post-test, which again contained traditional word problems and which was taken only a few days later, nearly all pupils reverted to a linear answer.2 Although the interview task was not designed for instructional purposes but merely had a diagnostic goal, it remains remarkable that pupils who had solved the problem correctly during the interview did not notice the isomorphism between the problem situation offered in the interview, on the one hand, and the problem situation described in the post-test item, on the other hand (Reed, 1999). While we cannot exclude that some children who failed on the post-test problem may have noticed and tried to exploit the analogy between that problem and the one they had solved during the interview session, we believe that most children from the P-condition unconsciously or deliberately did not make the connection between the two problems, because they experienced both task settings as completely different, and therefore, as also requiring different problem-solving approaches and different solutions. This interpretation, however, requires further empirical research. Retrospective in-depth interviews specifically dealing with pupils' task perceptions while being tested or interviewed may provide a better understanding of how the different experimental settings influence their problem-solving processes and their responses. However, further empirical research is also needed to decide which other measures have to be taken to guarantee stable learning and transfer effects among learners from valuable problem-solving experiences around authentic performance tasks such as the ones provided in the P-condition. Our findings nevertheless suggest – as Boaler (1993, p. 14) phrased it – that ‘assumptions regarding enhanced understanding and transfer as a result of learning in context may be over simplistic’. The use of authentic mathematical tasks as such may indeed strengthen pupils' motivation and interest and make them approach problems more meaningfully and thoughtfully. However, as our study has shown, this does not imply that pupils automatically establish connections between those authentic mathematical tasks and other, more abstract problem situations or school-like problems that have equivalent mathematical structures. Explicit instructional attention seems necessary to bridge the gap between pupils' informal, context-bound work and the intended generalizable mathematical insights that they can rely upon on later occasions.

Acknowledgements

This research was supported by Grant GOA 2006/01 ‘Developing adaptive expertise in mathematics education’ from the Research Fund K.U. Leuven, Belgium.

Footnotes

  • 1In this study, we opted for problems about area (instead of volume), and more specifically about the area of squares, because this was most convenient for the intended experimental manipulations (i.e. the provision of drawings and concrete materials).
  • 2It could be argued that the observed ‘context-specific’ effect is in fact an ‘item-specific’ one, since both during the interviews and on the pre- and post-test, we assessed students' performance with a single experimental item. Although this choice was inspired by methodological considerations (hiding the experimental item in a series of distractor items), it is also a potential weakness in this study.