Feasibility Study of the Boston Naming Test for the Arab Population

Introduction The Boston Naming Test (BNT) is a widely used US neuropsychological evaluation of confrontation naming for the examination of adults and children with learning disabilities and diagnosis of communication disorders, aphasia, dementia, and acquired brain injury or dysfunction. The purpose of the present study is to evaluate the practicality of the original English version of the 60-item BNT (BNT-60) on an Arab population and the need for a new adaptive Arabic version sensitive to cultural biases and to offer normative data that can serve as a reference for researchers and clinicians in the Gulf region, especially the Kingdom of Saudi Arabia (KSA). Data relating to the familiarity degree of the BNT-60 were also collected. Methods This research involved 105 randomly selected and cognitively healthy college students who were native Arabic speakers recruited in Jeddah. The Montreal Cognitive Assessment (MOCA) was administered with a cutoff score of 26. The participants were examined for naming accuracy, naming agreement, and familiarity in using the BNT-60. The data were then analyzed and compared with the findings from studies conducted in the United States. Results The BNT-60 was administered to 105 university students from the KSA, and the results were compared with the BNT-60 booklet norms (second edition). Their average performance was noticeably below the norms established by the original test standards. Compared with the participants in the US studies, the participants made approximately 65% more errors on the items including pretzel, wreath, beaver, harmonica, acorn, stilts, harp, hammock, knocker, pelican, muzzle, unicorn, funnel, accordion, asparagus, tripod, yoke, and trellis and 25% more errors on the items including seahorse, dart, igloo, sphinx, palette, and abacus. The item "boomerang" was not compared with the US sample because of differences in the version of the BNT, but the errors in naming this item were as frequent as those in naming the other misrecognized items. The internal consistency among the items’ degrees of familiarity was also very high (α = 0.966), and a significant connection (r = 0.837, P < 0.001) was observed between object familiarity and naming accuracy. The Arabic-speaking population in the KSA and English-speaking population in the United States showed very different levels of familiarity with numerous items. Conclusion The participants’ familiarity with the BNT objects varied depending on their culture and impacted their naming accuracy and overall scores on the test. Accordingly, the possibility of cultural biases should be considered when administering the BNT to the population of the KSA and the possibility of making changes so that the test better reflects the Arab culture as suggested.


Introduction
The Boston Naming Test (BNT) is a popular confrontation naming test used in the United States for the neuropsychological evaluation of adults and children with learning impairments and the diagnosis of communication problems such as aphasia, dementia, and acquired brain injury or malfunction.The first version of the test consisted of 85 items, and subsequent research led to the development of modified versions with 60 and 15 items.The questions are based on black-and-white line drawings of objects ranging with respect to the difficulty of identifying them from common to uncommon.If a correct response is not made within 20 seconds, the examiner may provide a stimulus cue (i.e., a phonemic cue) and then show the participant a list of words on the back of the picture from which the correct one can be chosen [1][2][3][4].
In representing the population-specific frequency of the correct answers, numerous researchers have adapted the original items of the test for use with speakers of languages such as Turkish.They adopted 29 new items (5).Portuguese researchers have adapted 20 items (6), and Korean researchers have developed 50 new items (7).Additionally, other English-speaking countries, including Australia and New Zealand, have produced population-specific versions of the BNT and modified two items from the original test (9).The main reason for the development of these other versions is that the test instrument is highly sensitive to the cultural context; hence, it is insufficient simply to translate it into another language [5].Other factors may also significantly affect BNT scores.For example, there is consistent evidence that educational level is a significant variable in this regard, and several studies have found that naming ability decreases with age, although the influence of gender remains unclear [5][6][7][8][9][10][11].
In both the Turkish (5) and Swedish (8) studies, when they adapted their new versions of the BNT-60, they initially conducted a pilot study with a sample size of 120 participants for the former and 111 for the latter.After identifying the items that required adaptation, they proceeded with the development of new suggested items and the norms determination of the study.
Even with the strong evidence of the reliability of the BNT for measuring linguistic abilities, researchers have not yet assessed its feasibility for use in clinical and research settings in the KSA.Therefore, the primary objective of the present study is to assess the applicability of the original BNT for native Arabicspeaking populations.The secondary objective is to determine which items need to be replaced, assess the degree of familiarity (FAM) for each object on the BNT-60, and correlate the accuracy of naming the test pictures with the FAM.

Sample group
This study involved 105 healthy college student volunteers enrolled at King Abdulaziz University in the city of Jeddah whose first language was Arabic.The participants were required to meet specific selection criteria, including being a college student between the ages of 18 and 25, being born and raised in Saudi Arabia, obtaining a Montreal Cognitive Assessment (MOCA) score of at least 26, and having no history of neurological, psychiatric, or drug abuse disorders.Every participant provided written informed consent to participate in the research.

MOCA
Medical professionals use the MOCA test to assess memory loss and other signs of cognitive impairment and screen for those who are at risk for Alzheimer's disease, other types of dementia, Parkinson's disease, brain tumors, and drug addiction.The Arabic version was prepared by Ziad Nasr Alden, a Canadian neurologist notable for creating the Arabic version of the MOCA.The scores on the MOCA range from 0 to 30, and 26 or higher is considered a normal score [12].

BNT
Edith Kaplan, Harold Goodglass, and Sandra Weintraub created the first version of the BNT in 1983 with 85 items [4].After further study, new versions were prepared, one consisting of 60 items and another of 15.The pictures used in the study are line drawings of objects ranging from common to rare, with the responses to be provided within 20 seconds.When the test is administered to normal individuals, it begins with item number 30 based on the assumption that they are familiar with the previous items.When a participant makes six mistakes, the test is to be discontinued.However, in the present study, no pictures were omitted because the aim was to evaluate the effectiveness of all 60 pictures [4].

Procedure
The Ethics Committee of King Abdulaziz University Hospital approved this study on January 15, 2023 (Reference No. 7-23).No legal barrier prevented the use of either the MOCA or the BNT in research settings.
The MOCA has open copyright and was taken from the original website (www.mocatest.org),while the BNT-60 was purchased from www.proedinc.com.
Before the study began, the participants were made aware of its goal and provided their informed consent in writing.The examiners also completed a form to collect the participants' demographic information.Each participant who met the inclusion criteria was then administered the MOCA as a screening tool before the administration of the BNT-60.The participants who achieved a score of at least 26 on the MOCA were then presented with the pictures for the BNT-60 and assessed for 1) naming agreement (NA) and 2) FAM.
Naming agreement (NA) refers to the degree of variation in the words that the participants used to identify the picture.Pictures that evoke the same name from most participants have high NA, and those that elicit a variety of names have low NA.Low NA can be because of either misidentification or the fact that there are numerous correct names for the same object [12].In this study, we calculated the most frequent words used by the participants as proportions.
Familiarity (FAM) relates to the prevalence of an object in the experience of language speakers and can be used to estimate the latency period for each object by having the participants rank the pictures on a fivepoint scale according to their familiarity with them.Thus, the participants were asked to rate their familiarity with the objects, with five indicating very familiar objects, and one indicating objects that were the least familiar.We instructed them to rate the items rather than the drawings of them [12][13][14].
We introduced the BNT-60 starting from item 1, without adaptation of the items and in the same test order.The participants were asked to name the objects that were shown to them within 20 seconds.Unlike in the administration of the original test, we provided no cues as we aimed to assess the NA for the test items.After the 20 seconds elapsed or the participants named an item, a familiarity assessment was performed by asking them to rank the objects on a scale of one to five.
For scoring, we assigned a value of one for correct Arabic responses and zero for incorrect responses, English responses, and "don't know" answers.Table 1 presents the items that the participants answered in English.Table 2 presents the items with more than one correct name in Arabic.For all of the tests, trained sixthgrade medical students examined each participant individually in a single 30-45 minute session held in the classrooms and the library at the university.We then collected all of the answers using a Google questionnaire that the examiners filled out.

Results
The sample consisted of college students recruited from the city of Jeddah.Their mean age was 21 years (SD = 1.79), their age range was 18-25 years, the mean for the extent of their education was three years of undergraduate studies (SD = 1.38), and 51% were female.Table 3 shows the mean scores and standard deviations for both the current study sample and the original BNT-60 norms [4] for adults.The comparison of the original BNT total scores (means and standard deviations) with the current sample showed a substantial difference (T(124) = 20.146,P < 0.001).An independent T-test indicated a significant difference in the mean scores for the current sample and the original BNT norms.The highest score recorded for the KSA population was 50, and the lowest score was 25.Along with comparing the average scores of the two populations, we aimed to identify items that are highly sensitive to cultural bias by comparing the patterns in the errors made by the participants from the KSA with those in the errors by native English speakers in an earlier study of the US population, as shown in Table 4 [15].A difference in the rate of more than 20% was considered significant.The participants from the KSA made around 65% more errors than the participants in the US study on the items pretzel, wreath, beaver, harmonica, acorn, stilts, harp, hammock, knocker, pelican, muzzle, unicorn, funnel, accordion, asparagus, tripod, yoke, and trellis, as well as 25% more errors on the items seahorse, dart, igloo, sphinx, palette, and abacus.The US study that we used for comparison with the pattern of errors in our sample involved the old version of the BNT, which features a noose instead of a boomerang.For our study of the KSA population, we used the second edition of the BNT-60, which features the boomerang rather than the noose (Item #48) [4,[15][16].

Item
As shown in Table 4, the participants from the KSA made a significantly low performance in identifying the drawing of a boomerang, with only one participant identifying the item correctly in Arabic (the name is shown in Table 2) and three participants responding with the English name (i.e., boomerang; Table 1).
The Arabic name for Item #48, boomerang, is , and the participant's answer was close and relatively correct, so credit was given for this response, as shown in Table 2, while, as mentioned, no credit was given for the English responses.
The BNT-60 FAM for our sample demonstrated a high degree of internal reliability, with an alpha value of 0.966.Item #1, bed, showed no variance, so it was deleted from the scale.The familiarity of the participants with each BNT-60 item was noted and compared with the results of another published study that evaluated the familiarity of 98 US students who were native English speakers with the BNT-60 images [17].In our study, the average familiarity score for the participants from the KSA was 4.28, compared with 4.70 for the participants in the US study.As shown in Table 4, Item #19, pretzel, was significantly more familiar to the US participants than to the KSA participants.Specifically, in further analysis of items with a familiarity difference of > 0.5, the rating for Item #19 was 4.83 for the former and 3.92 for the latter.Similarly, the U.S. participants were more familiar with seahorse (US: ).When we compared the FAM rate average for the participants in our study for the item boomerang with the FAM degree of the overall average score, we found the rating for this item to be quite low, at > 0.5 (Item #48: 3.57; KSA average: 4.28).
We used Spearman's rank to investigate the association between accurate naming and familiarity.The correlation coefficient for the relationship between NA and FAM of 0.837 (P < 0.001) showed that these assessments were closely associated.We calculated the naming agreement rates, which are shown in Table 2, as the percentage for each name given by the participants.Alternative correct responses that were consistently given by at least 10% of participants were permitted, provided that the names accurately described the item or were listed in Arabic dictionaries as synonyms for the names.

Discussion
The BNT is a widely used neuropsychological assessment tool for measuring an individual's ability to name objects.It is widely used in the assessment and diagnosis of several neurological disorders, including traumatic brain injury, dementia, and aphasia.However, it is crucial to generate adequate normative data for different demographics and take language and cultural aspects into account to ensure an accurate interpretation of BNT results.
Numerous research studies have attempted to produce normative data for the BNT among different populations, highlighting the importance of having normative data for specific languages.For example, a study focused on the adult Spanish-speaking population in Latin America provided normative data and developed a standard version of the BNT [1].Similarly, studies conducted on the elderly Turkish and Dominican populations produced their own BNT norms [5,10].
Furthermore, linguistic and cultural variables can significantly impact BNT performance.Barker-Collo drew attention to the potential cultural bias of the BNT and discussed possible adaptations [9].
In our study, the participants were native Arabic speakers from the KSA and highly educated university students.They performed poorly on the BNT-60 in terms of naming accuracy (NA) and overall scores.They were less familiar with many of the test items compared to the participants in the US study, with whom we compared their scores.The items "asparagus," "yoke," and "trellis" had the lowest overall scores and familiarity ratings among the KSA participants, in contrast to the US participants' average responses.This suggests that the US participants were more familiar with those items and performed better on the test.The KSA participants struggled with recognizing items related to musical instruments beause of a lack of exposure.While they recognized these items as belonging to the category of musical instruments based on media exposure, they were unfamiliar with the specific names of the instruments.The responses for the items "stilts", "hammock", "abacus", and "boomerang" were similar among the participants (Table 2).
Comparing the responses of the two participant groups to the drawings of animals (e.g., camel, beaver, and pelican), we found a significant difference in both naming accuracy and familiarity ratings.Camels are common in the KSA, while beavers and pelicans are not.The KSA participants also misrecognized the items "pretzel," "beaver," "acorn," and "pelican" as "snake," "rat," "hazelnut," and "stork," respectively.Additionally, most of them identified the wreath as either a headband or flowers, indicating their inability to recognize an object that is uncommon in the KSA population.These differences in familiarity ratings for BNT items accurately reflect the cultural differences between the Arab and US populations.
It is important to acknowledge the limitations of our research.Firstly, as our study was confined to Jeddah, the applicability of the data to other Arab and Gulf regions may vary.Further research is needed to evaluate the performance and suitability of the BNT-60 in different cultural and geographical contexts.
Secondly, our research was restricted to a subset of highly educated university students within a specific age range.This sample may not accurately reflect the age distribution and educational attainment of the general population.It is recommended that future research endeavors incorporate participants from a range of ages and educational levels to achieve a more comprehensive understanding of the BNT-60's performance among the Arabic-speaking community.
Finally, our study focused on assessing the effectiveness of the current BNT-60 items in the Arabic-speaking community, and it did not intend to produce new test items.Therefore, subsequent research could include additional objects.

Conclusions
The primary aim of the present study was to evaluate the appropriateness of the BNT-60 for Arab populations and assess the need for a new version for this population.By comparing the mean scores of college students recruited from the city of Jeddah in the KSA with the results for the same age group from the second version of the BNT-60, we demonstrated that a significantly different version is needed for the population represented in the study.Our secondary aim was to identify items that need to be replaced.To do so, we calculated the difficulty index and compared our findings with the findings in published papers on the US population.
Moreover, the findings presented here comparing the familiarity of the items on the BNT-60 to the KSA population with their familiarity with the US population show a correlation between FAM and NA.The KSA participants, as predicted, were less familiar with many items than the participants in the US study.The findings show significant positive agreement regarding the differences in the performance of the participants in the US and KSA studies.Future studies could further explore the possibilities of substituting more familiar objects for those less familiar to the Arab population.

TABLE 3 : Mean and standard deviation by age range from the original BNT record booklet (second edition) and KSA samples.
KSA (n = 105), original BNT (n = 21).The BNT mean and SD from the original BNT.Norms = spontaneous correct responses + correct responses after giving stimulus cues, while the Arabic mean and SD are based on spontaneous correct responses only.

TABLE 4 : Difficulty index represented as a percentage of correctly answered pictures for each BNT item.
The items that are highlighted in the table indicate the lowest levels of recognition and familiarity.The item noose was replaced by a boomerang in the new (second edition) that we used in this study.M = mean, SD = standard deviation.