Computer-Generated Psychological Test Interpretation

November 30, 2015

Computer-generated interpretation of psychological tests has exploded over the past ten years.

Unfortunately, the idea that a computer is giving an objective interpretation is as false as saying that if something is on the internet, it must be true. The publishers of the psychological tests use mathematical algorithms to generate interpretive statements. The problems with these interpretations are many. Most significantly, they are based on proprietary algorithms that are known only to the publisher and the author of the instruments. It is not clear how the particular algorithms are generating their hypothetical interpretations and how well the various aspects of the test are integrated. Interpretations that are computer-generated can have contradictory statements or make interpretations which are not consistent with a full understanding of how various scales on a test are interrelated. In reference to the use of computer-generated test interpretation (CGTI) and the MCMI-III (Millon Clinical Multiaxial Inventory-Third Edition), Gould wrote:

The CGTI does not provide information about the contribution of score adjustments on clinical personality pattern scores, leading to erroneous interpretation about DSM-IV diagnoses. These diagnoses may inappropriately find their way into the evaluator’s report or testimony (2006, p. 285).

I believe that it is unethical for a psychologist to use computer-generated test interpretation as the sole means of interpreting a particular psychological test. The American Psychological Association (APA) ethics code 9.06 (Interpreting Assessment Results) states:

When interpreting assessment results, including automated interpretations, psychologists take into account the purpose of the assessment as well as the various test factors, test-taking abilities, and other characteristics of the person being assessed, such as situational, personal, linguistic, and cultural differences, that might affect psychologists’ judgments or reduce the accuracy of their interpretations. They indicate any significant limitations of their interpretations.

The APA ethics code in Section 9.09 (Test Scoring and Interpretation Services) states:

  1. Psychologists who offer assessment or scoring services to other professionals accurately describe the purpose, norms, validity, reliability and applications of the procedures and any special qualifications applicable to their use. (b) Psychologists select scoring and interpretation services (including automated services) on the basis of evidence of the validity of the program and procedures as well as on other appropriate considerations. (c) Psychologists retain responsibility for the appropriate application, interpretation, and use of assessment instruments, whether they score and interpret such tests themselves or use automated or other services.

Any psychologist who would participate in evaluating a litigant solely based on a computer-generated test interpretation would be in violation of the APA ethics code.

The AFCC Model Standards of Practice for Child Custody Evaluation section 6.6 (Use of Computer-Generated Interpretive Reports) states:

Evaluators shall exercise caution in the use of Computer Generated | Psychological Testscomputer-based test interpretations and prescriptive texts. In reporting information gathered, data obtained, and clinical impressions formed and in explaining the bases for their opinions, evaluators shall accurately portray the relevance of each assessment instrument to the evaluative task and to the decision-making process. Evaluators shall recognize that test data carry an aura of precision that may be misleading. For this reason, evaluators shall not assign to test data greater weight than is warranted, particularly when opinions expressed have been formulated largely on some other bases.

Too often, attorneys, judges, and juries assume that computer-generated interpretations of psychological tests are more objective than the careful interpretation done by the psychologist who is conducting the evaluation. Assuming the psychologist conducting the evaluation is competent to interpret psychological tests, the CGTI are not more objective and certainly not as well integrated with the actual person being evaluated.

Psychological tests are not as objective as many in the legal community believe. At best, psychological tests describe characteristics that a sample of people share who have obtained similar score configurations on the test, to the particular litigant’s score configuration. It is possible that two people can have exactly the same score on the Depression Scale of the MMPI-2 and yet their mood is expressed quite differently. One person may be crying, withdrawn, and suicidal, while another may feel helpless and useless, but still functioning on a day-to-day level. The proper forensic investigative protocol of multiple-methods would assist in correlating the hypotheses developed using the psychological tests to the actual functioning of the individual. It is only through the integration of the test results with the other pieces of information that an evaluator can come to conclusions that are to a psychological certainty.

Some in the legal community have proposed relying on computer-generated interpretations of psychological tests in lieu of full psychological evaluations. This idea is usually suggested as a cost effective strategy to obtain answers about an individual’s mental status. While I understand the desire for less costly assessments, this approach fails to recognize the inherent problems in relying on CGTI. While the computer administration and scoring of psychological tests provides a cheaper alternative to psychological evaluations, the interpretation of these tests does not integrate any of the necessary information to make an accurate, valid, and ethically sound evaluation of an individual. There would be no way to say to a psychological certainty that what the computer interprets accurately reflects the individual under investigation. I doubt most attorneys would say that a LegalZoom computer generated legal document is the most comprehensive way for individuals to manage their legal needs. I assume a competent attorney’s evaluation of the legal needs of a client is more comprehensive than the results of some online questions that a computer integrates into a document template.

It would be unethical for a psychologist to participate in an evaluation that consisted of a client taking a psychological test and using the computer generated interpretation as the sole means of understanding that client’s mental status.  The psychologist would be “rubber stamping” the computer-generated interpretation and not integrating any of the test data with all of the other pieces of relevant information. I would question how an interpretation of unknown validity and reliability could be admitted as expert testimony in a legal case. Radiologists have computerized screening programs for Mammograms, x-rays, MRI, and CT scans. Nevertheless, they rely on their own interpretation of the significance of any findings. A diagnosis, or lack of a diagnosis, is never determined solely on the computer’s interpretation. The radiologist has to correlate the findings of the computer with other information specific to the patient. The same can be said of the psychologist who is ethically interpreting psychological tests.

Non-psychologists can incorrectly interpret many CGTI reports. As mentioned above in reference to the MCMI-III, a high proportion of custody litigants would have elevated scales on Histrionic, Compulsive and Narcissistic Personality traits. These findings would lead to many healthy litigants being mislabeled as having a Personality Disorder. That is why many researchers have recommended that the MCMI-III not be used in child custody evaluations.

As an attorney, you should never allow a computer-generated test interpretation of your client that has not been reviewed by a competent psychologist to be used as part of any forensic or clinical evaluation. Furthermore, the test alone should never be used in isolation of other data.

Forensic Psychologists use interviews, psychological tests, document reviews, and collateral interviews to gather data, which is ultimately integrated into a cohesive picture of the client’s mental status and perhaps parental fitness. Each piece of information adds to the total understanding of the client. This multi-method approach to forensic evaluations leads to a convergence of information from clinical interviews, test results, collateral sources, etc. This is the best way to reach a valid and reliable finding about a litigant. Testing is one piece of the process and one source of data. Most attorneys are surprised to find out how little the actual psychological testing adds to the overall understanding of the individual under investigation. Although it typically adds very little to the total data reviewed by the psychologist, the results are occasionally pivotal to fully understand the client.

Since the underlying algorithms behind computer-generated test interpretation are not known, the court would have no way to know the relationship between the data and the expressed opinion (the computer-generated test interpretation print-out). Since there is no way to directly connect the opinion in the computer generated interpretation to the underlying data there would be potential admissibility issues.  Jay Flens, Ph.D. (2005) wrote in reference to expert testimony (which would include the interpretation of psychological testing):

In 1997, the U.S. Supreme Court further extended their thinking on Daubert in General Electric Co. v. Joiner (1997). The Joiner decision focused attention on the need for the expert to show how opinions expressed were connected to the data upon which the opinions are based. No longer was an expert’s say-so appropriate. An expert had to show a relationship between reliable data and expressed opinion. (2010, p. 12).

The CGTI does not provide the connection between the underlying “interpretation” and the data or research that justifies the interpretation.

For a psychological test to be appropriate in a specific case, the test must be reliable and valid. The legal community’s use of the term reliable refers to what social scientists term validity. Jay Flens, Ph.D., has published extensively on psychological testing in child custody evaluations (2005). He defines reliability as referring, “….to the consistency of results, including but not limited to consistency across time, situation, and evaluator; it asks the question, “Does the test consistently measure what it purported to measure? (p. 5)” Dr. Flens defines validity as referring to “… the accuracy of the test; it answers the question, “Does the test accurately measure what it is purported to measure? (p. 5). Just because a test is reliable, does not mean it is valid for use in a particular situation. Some tests are only valid to use with very specific populations. Even on well-known forensic psychological tests, like the MMPI-2, some scales should only be interpreted with very specific populations. For example, the Over Controlled-Hostility Scale on the MMPI-2 is valid for use among prisoners, but not child custody litigants. Nevertheless, the CGTI report gives interpretations of that scale, even in child custody cases.

The MCMI is currently in its third edition (MCMI-III).  The test has norms that are based on a clinical population. They are not based on samples of child custody litigants. Furthermore, until the past couple of years, the test had separate norms for women and men. Unfortunately, women would routinely obtain extremely high scores on the scales “Histrionic” and “Compulsive” relative to men taking the same test.  The publisher has “re-normed” the test to do away with separate norms for men and women. Unfortunately, most of the research, which has used the MCMI-III, is based on the old norms.

When given to people with a reason to respond in a highly defensive and desirable way, the MCMI-III will often yield scale elevations on the “Histrionic,” “Compulsive”, and “Narcissistic” scales. Individuals undergoing child custody evaluations and other forensic evaluations are often attempting to present a positive picture of their personality and functioning. The litigant, taking the test that is consciously or unconsciously presenting with positive impression management, will likely have those three scales elevated. Halon (2001), writing in the American Journal of Forensic Psychology (The Millon Clinical Multiaxial Inventory-III: The Normal Quartet Child Custody Cases) referred to these three scales (Histrionic, Compulsive, and Narcissistic) as the “normal quartet.” Jonathan Gould wrote in reference to the normal quartet:

The three personality scales have several items in common, and each scale is highly correlated with the Desirability Scale. Thus, a common finding among parents undergoing child custody evaluations is that various combinations of the fake good triad will be elevated in a child custody context, some of which may represent healthy and adaptive personality traits and some of which may represent maladaptive and pathological states (2006, p. 284).

Unfortunately, the untrained eye would easily see these elevated scales as signs of a personality disorder or traits, when they may be indicative of healthy functioning. Those of us who review the reports of other psychologists have unfortunately seen too many evaluators who have misdiagnosed a litigant with a histrionic, narcissistic, or compulsive personality disorder. This was due to improper interpretation of the scale elevations without taking into consideration the context in which the test was administered and the population of test takers.

Most attorneys and judges who have worked with psychologists or heard the testimony of psychologists have heard terms such as standard deviations, T-scores, and Z scores.  These are various measures of the central tendency of scores around the statistical mean. The T-score is a specific cut-off for the likelihood of a score deviating from a normal range. Many attorneys and judges know that on the MMPI-2 the standard deviation is 10. A score over a T score of 65 is significantly deviating from the norm. On the MCMI-III, T-scores are not used.  Instead, the test utilizes Base Rate Scores, which approximate the presence of the disorder or traits in the population. A base rate of 75 is normal. The untrained individual looking at an MMPI-2 profile and an MCMI-III profile are likely to confuse the two very different types of scores. A score on a scale of the MMPI-2 of 75 is significant, while the base rate score on the MCMI-III of 75 is not.

A non-forensic psychologist, attorney, or judge reading the CGTI on the MCMI-III may conclude, wrongly, that an individual has a serious mental disorder such as a Narcissistic, Compulsive or Histrionic Personality Disorder. The reality may be the client who took the test was defensive and trying to show him or herself to be in the best possible light. This type of defensive, positive impression management response is all too common in family law litigation.


American Psychological Association. (2010) Guidelines for Child Custody Evaluations in Family Law Proceedings.

American Psychological Association (2002) Ethical Principles of Psychologists and Code of Conduct. American Psychologist, 57(12), 1060-1073.

Association of Family and Conciliation Courts. (2009) Model Standards of Practice for Child Custody Evaluations. Family Court Review, (45) 1 70-91.

Flens, James R. The Responsible Use of Psychological Testing in Child Custody Evaluations: Selection of Tests (2005). Journal of Child Custody, 2(1-2), 3-29

Gould, Jonathan W. Conducting Scientifically Crafted Child Custody Evaluations-2nd Ed. Sarasota, FL: Professional Resource Press


Posted Under: Professional News

Comments are closed.