45 2.9. Instrumental Research
Instrumental research according to Ziman (2002) means "the production of knowledge with clearly foreseen or potential uses. Research that is subordinated to a concrete purpose of application and utilization of the knowledge sought qualifies as instrumental. The practice and norms of instrumental research are almost the opposite of those of academic science. Being normal funded by contracts rather than by patronage, instrumental science is so captive of material interests and commercial agendas that is partisan rather than objective in its judgments.
Its findings are exploited as intellectual property, and are thus proprietary rather than public.
Because it serves specific power groups and technical elites, it tends to produce "local" rather than universal knowledge (Wilholt, 2006). Processes in instrumental research are guided by design rules. For instrumental research to promote enterprise that such research aims at the design rules must be strictly adhered to by the researcher (Wilhot, 2006). Generation of items, trial testing and the estimation of the psychometric properties to establish the validity and reliability of the instrument are significant steps in the processes involved in instrumental research.
Instrumental research uses stakeholders as a means for collecting data. In this study, lecturers and students were used. The information gathered from the stakeholder through scientific methods or any means of collecting data is translated into scientific knowledge with the means of statistics or other quantifiable method.
46
several procedures are used by researchers to assess the validity of measures, and good research always evaluates any measure used by applying at least some of these procedures (Zeller and Carmines, 1980). Validity refers to the extent to which a scale measures exactly what it is being developed to measure. Falaye (2008) stressed that “to ensure validity, the counselor or evaluator must ensure that the items in the instrument cover a representative sample of the entire content area, which may be cognitive, affective and psychomotor”. Theresa (2006) defined validity as the appropriateness, meaningfulness, and usefulness of the specific inference researchers make based on the data they collect. She said that it is possible to have highly reliable instrument that are useless. For measuring to be useful, Adegbuyi (2011) suggested that, „it must be reliable and valid‟. The ways of determining validity are discussed below.
Face Validity involves assessing whether a logical relationship exists between the variable and the proposed measure. Essentially, it amounts to a rather commonsense comparison of what comprises the measure and the theoretical definition of the variable: Does it seem logical to use this measure to reflect that variable? We might measure child abuse in terms of the reports made by physicians or emergency room personnel of injuries suffered by children. However, this is not a perfect measure because health personnel might be wrong. It does seem logical that an injury reported by such people might reflect actual abuse. No matter how carefully done, face validity is clearly subjective in nature. All we have is logic and common sense as arguments for the validity of a measure. This serves to make face validity the weakest demonstration of validity, and should usually be considered as no more than a starting point. All measures must pass the test of face validity. If they do, we should attempt one of the more stringent methods of assessing validity.
An extension of face validity is called content validity or sampling validity. It has to do with whether a measuring device covers the full range of meanings or forms that would be included in a variable that is being measured. In other words, a valid measuring device would provide an adequate or representative sample of all content, or elements, or instances of the phenomenon being measured. For example, in measuring teaching effectiveness, it would be important to recognize that some indicators, such as mastery of subject matter, teaching style, communication skills, and organization measure teaching effectiveness. A valid measure of teaching effectiveness puts all these indicators into consideration.
47
Validity can also be established by showing a correlation between a measuring device and some other criterion or standard that we know or believe accurately measures the variable under consideration. Or we might correlate the results of the measuring device with some properties or characteristics of the variable the measuring device is intended to measure. For example, a scale intended to measure teaching effectiveness should correlate with the achievement of students, if it is to be considered valid. The key to criterion validity is to find a criterion against which to compare the results of our measuring device. Criterion validity moves away from the subjective assessments of face and content validity and provides the more objective evidence of validity. One type of criterion validity is concurrent validity, in which the instrument being evaluated is compared to some already existing criterion, such as the results of another measuring device. If I had developed a new scale on teaching effectiveness, for example, i could compare its results to the results from existing scales on teaching effectiveness.
A second form of concurrent validity is predictive validity, in which an instrument is used to predict some future state of affairs. Sometimes the future state of affairs used to validate measure is too far in the future and an earlier assessment of validity is desirable. If so, a variation on predictive validity, called the known groups approach, can be used. If it is known that certain groups are likely to differ substantially on a given variable, a measure‟s ability to discriminate between these groups can be used as an indicator of its ability to predict who will be in these groups in the future.
The most complex of the types of validity discussed here, involves relating a measuring instrument to an overall theoretical framework in order to determine whether the instrument confirms a series of hypotheses derived from an existing and at least partially verified theory (Cronbach & Meehl, 1955; Zeller & Carmines, 1980). This can be in terms of not how they simply relate to any criterion but rather to measures of concept that are derived from a broader theory. This is the principle behind construct validity which is adduced with numerous comparisons with a variety of concepts derived from theory. For example, Murray Straus and his colleagues in 1996 developed a Conflict Tactics Scale (CTS) to measure how partners resolve conflicts in relationships. It is partly a measure of the use of psychological and physical aggression, but it also measures forms of conflict resolution in general. The CTS consists of a number of subscales, and Straus and colleagues assessed the construct validity of the subscales.
48
In addition to validity, measures are also evaluated in terms of their reliability, refers to a measure‟s ability to yield consistent results each time it is applied. Determining reliability requires reliability testing to ascertain both stability and internal consistency of the research instrument. Stability, or “test-retest reliability”, is determined by using a reliability coefficient, discovering the consistency of results obtained on more than one administration of the instrument. The usual interval is 2 to 3 weeks. The reliability coefficient is “the correlation coefficient between the two sets of scores” (Polit & Beck, 2004). While attitudes tend to remain stable, be aware that knowledge can change the second administration as a direct result of the first administration. The most widely used method is the calculation of the coefficient alpha or Cronbach‟s alpha.
Chapman (2003) defined reliability as the extent to which a measurement is repeatable with the same results. Reliability according to Falaye (2008) refers to consistency between two sets of scores obtained or observations made using the same instruments. Bamidele, 2004) who says a reliability scale should give the same measurement over and over again concluded that reliability of a measuring instrument is the degree of consistency in response of the respondents on different occasions. In other words, reliable measures do not fluctuate from time to time unless the variable being measured has changed. In general, a valid measure is reliable. The reliability of a test is intended to specify the degree of accuracy, dependability or consistency with which the test measures the variable it is designed to measure (Thorndike, 1990).
The concept of reliability has been under continuing reformation and redevelopment with the resulting increase in clarity and range of applicability. Good (1996) advocates three reasons for estimating reliability coefficient to include: guiding test selection, supporting inference about test score, standard error of measurement and supporting inference about validity of a perfect reliable test.
Methods of employing reliability of an instrument are of four types. The first and most generally applicable assessment of reliability is called “test-retest”. As the name implies, this technique involves applying a measure to a sample of individuals and then, somewhat later, applying the same measure to the same individuals again. After the retest, there are two scores on the same measure for each person. As a matter of convention, a correlation coefficient of .80 or more is normally necessary for a measure to be considered reliable. If a reliability coefficient does not achieve the conventional level but is close to it, the researcher must make a judgment
49
about whether to assume the instrument is reliable (and that the low coefficient is due to factors other than the unreliability of the instrument) or to rework the instrument in order to obtain higher level of association.
In actual practice, the test-retest method sometimes cannot be used quite as simply as suggested because exposing people to the same measure twice creates a problem known as
“multiple-testing effects” (Campbell & Stanley, 1963). A group of people may not react to a measure the second time the same way as they did the first. They may, for example, recall their previous answers, and this could influence their second response. Students might recall previous responses during the first attempt and maintain consistency or purposefully change responses for the sake of variety. Either case can have a confounding effect on testing reliability. The most serious problem with the test – retest method is the actual memory of particular items and of previous response to them.
A parallel test form of reliability estimate is the correlation between observed scores on two parallel tests. When developing the scale, two separate but equivalent versions made up of different items are created. These two forms are administered to the same individuals at a single testing session. The results from each form are correlated with each other, as was done in test-retest, using an appropriate statistical measure of association, with the same convention of r= .80 required for establishing reliability. If the correlation between two forms is sufficiently high, we can assume that each scale is reliable.
The advantage of multiple forms is that only one testing session is required and no control group is needed. This may be a significant advantage if either parallel testing sessions or using a control group is impractical. In addition, one needs to worry about forms changes in a variable over time because both forms are administered during the same testing session. Again, the problem with this method is that in preparing parallel form tests, there is the danger that the two forms will vary so much in content and format that each will have substantial specific variance distinct from the other.
In the split-half method of reliability, the test group responds to the complete measuring instrument. The items that make up the instruments are then randomly divided into two halves.
Each half is then treated as though it was a separate scale, and the two halves are correlated by using an appropriate measure of association. Once again, a coefficient of r=. 80 is needed to demonstrate reliability.
50
Internal consistency reliability is estimated using only one test administration, and thus avoids the problem associated with repeated testing. One complication in using the split-half reliability test is that the correlation coefficient may understate the reliability of the measure because, other things being equal, a long measuring scale is more reliable than a shorter one.
Because the split-half approach divides the scale in two, each half is shorter than the whole scale and well appears less reliable than the whole scale. To correct for this, the correlation coefficient is adjusted by applying the Spearman-Brown formula.
The split-half reliability test has several advantages. It requires only one testing session, and no control group is required. It also gives the clearest indication of reliability. For these reasons, it is preferred method of assessing reliability when it can be used. The only disadvantage, as noted, is that it cannot always be used.
A study that established the foundation of Instrumental Enrichment (IE) Research was conducted by Feuerstein and his colleagues with a population of five hundred socially and culturally disadvantaged Israeli adolescents (Feuerstein, R., R & Y; Hoffman, M., & Miller, R.
1980; Rand, Tannenbaum, & Feuerstein, 1979) The main research hypothesis was that cognitive performance and school achievement of students who receive two years of the IE program will be higher than those of the matching groups of students who receive the same amount of general enrichment lessons. The pre- and post-test measures included Thurstoneís Primary Mental Abilities Test and a specially designed curriculum-based Achievement Battery. The results confirmed the main hypothesis: IE group students showed significantly better results on the post-tests. In the cognitive area better results were achieved in spatial relations, figure grouping, numbers, and addition sub-tests. In the curriculum based tasks IE group students performed significantly better in Geometry and Bible studies. A follow-up study (Rand, Y., Mintzker, Y., Miller, R., Hoffman, M., & Friedlender, Y. 1981) conducted two years after the end of IE intervention demonstrated that IE group students continued to perform better that control group students in both verbal and non-verbal cognitive tests.
A large scale external validation of the IE program was conducted by the authors of the IE program in Venezuela (Ruiz, 1985; Savell, Twohig, & Rachford, 1986). In the study, adolescent students from higher and lower socio-economic status (SES) groups participated for
51
two years in the IE program. The effectiveness of the IE program was assessed with the help of pre-and post tests of general intellectual abilities, academic performance in mathematics and language, and in self-concept. The experimental IE group (318 students) was compared to the control group of equal size. Statistically significant gains for the IE group were observed in all three spheres: general intellectual abilities, academic performance and the self-concept. Before intervention, higher-SES group showed higher results in all three spheres. Some difference remained after intervention, but both groups improved their performance. As to intellectual abilities, both groups benefited equally, while in academic performance the high-SES group benefited more. It is interesting that pre-test differences in self-concept disappeared after intervention.