TEST-RETEST RELIABILITY OF UKRAINIAN VERSION OF THE HAEMO-QOL QUESTIONNAIRE

Background: Today psychometric characteristics of Ukrainian version of the questionnaire Haemo-QoL remain researched insufficiently. This publication describes the procedure and results of the test-retest reliability of this questionnaire in children with hemophilia A. Objectives: Study the test-retest reliability of the Ukrainian version of the Haemo-QoL questionnaire for assessing the health-related quality of life (HRQoL) in children with hemophilia A, and determine the possibility of its use in practical and theoretical medicine in terms of time reliability. Methods. The quality of life assessment (QoLA) was performed using the correspondence method, by interviewing 32 children with hemophilia A (self-report) and 32 parents of the children (proxy-report). The questioning was conducted twice at intervals of 4–6 weeks (primary test and re-test). The Haemo-QoL questionnaire version of age group I (4–7 years old), II (8–12 years old) or age group III (13–16 years) was used to measure the quality of life in children with hemophilia; and three versions were used for parents of children of corresponding age groups. These interviews were analyzed, the correlation between the data of primary test and re-test was determined using Spearman’s rho, Pearson correlation coefficients, Wilcoxon’s W criterion, Cohen’s d effect size was determined for separate comparisons. Results: HRQoL indices are not statistically different between tests on all scales (p>0.05), except for the “Family” and “Others” scales (p<0.05). Total HRQoL is statistically different (p=0.0013), however, with a median difference of 0.25 only and an average absolute difference variability of 1.67±1.51 (5.42±2.83 %). Total HRQoL in parent versions did not change statistically, unlike the children’s versions, but the difference between the tests was only 1.32 and Cohen’s d ES 0.08. For versions Ip, IIp, IIIc/p there was no statistical difference, the versions Ic and II statistically differed (p=0.038; t=–2.39 and p=0.0022; t=–3.98, respectively) with a mean difference of 2.0 and 1.6, respectively. Conclusion: The Ukrainian version of the Haemo-QoL questionnaire has sufficient test-retest reliability for quantitative dynamic reflection HRQoL in patients with hemophilia A.


Introduction
In today's medicine, which increasingly shifts towards the biopsychosocial model year after year, monitoring and improving the health-related quality of life (HRQoL) becomes one of the main components of managing children with chronic diseases; and scientific and practical developments on HRQoL research issues remain a priority direction of modern medicine [1,2]. The dynamic control of psychosocial health in patients with hemophilia, with HRQoL being one of the main indicators, is now recommended by the World Federation of Hemophilia (WFH) and the European Haemophilia Therapy Standardization Board (EHTSB) as one of the main management tasks for these patients [3,4]. The so-called disease-specific questionnaires are the most optimal tool for determining HRQoL in patients with chronic diseases [5,6]. Among these, to date, the most proven and widely used in children with hemophilia is the Haemo-QoL questionnaire [7,8]. This questionnaire has sufficient psychometric characteristics, which is proved in numerous studies [9,10]. To date, there is no validated Ukrainian version of this questionnaire, that significantly limits the full-fledged management of patients with hemophilia in Ukraine. Cultural and linguistic adaptation of the Ukrainian test version of this questionnaire, study of its validity and sensitivity has already been carried out in a separate research paper. Taking into account that, according to numerous recommendations, for the legitimate use of the instrument in the new ethnolinguistic environment the necessary step is to carry out a comprehensive psychometric analysis with confirmation of its validity, reliability and sensitivity, the issue of determining the reliability of the Ukrainian version of the questionnaire remains topical [11,12]. These three main characteristics of the questionnaires: sensitivity, validity and reliability Dr. Berrie Middel et al. aptly called "holy trinity" of psychometric analysis [13].
The reliability of the questionnaire is understood as its ability to give constant and accurate indicators when evaluating HRQoL. Today, the main components of reliability of questionnaires are the following: equivalence, stability (or time constancy, reproducibility, test-retest reliability) and internal consistency (homogeneity) [14]. In practice, stability and internal consistency are most often determined. Stability is the ability of the questionnaire to steadily in terms of time equivalent and accurately reflect the target function in quantitative dimension. The stability in the majority of adaptation studies is determined using the test-retest method, therefore, the term "test-retest reliability" (TRR) is often synonymous with reliability in time. This method of determining the sensitivity is to establish a correlation between the quantitative assessment of the target function in the primary and repeated questionnaires of the same respondent (group of respondents), provided that there are no objective changes in the health condition between the examinations that can potentially affect the assessment of their own well-being expressed in quantitative terms [14,15]. Although TRR study during adaptation of questionnaires is recommended by most guidelines, however, the study by M. Solans et al. demonstrates rather rare use of this method in adaptation processes. Thus, among 94 questionnaires for assessing the quality of life (including 30 general and 64 disease-specific) analyzed by researchers within 6 years, the test-retest method was used in validating questionnaires in only 20 % of such adaptation processes [16].  [9,17,18] remain extremely limited. In our opinion, the study of the TRR questionnaire, although it is labour-intensive in the time and emotional terms, is a key (but not unique) one for characterizing a specific instrument with a sufficiently reliable for its application in practical and theoretical medicine in terms of reliability. We fully share the opinion of Limperg P. F. et al. that today there is no need to develop new tools for assessing HRQoL for patients with hemophilia, and the study of characteristics of the existing instruments remains a priority [10].

Aim of research
Study the test-retest reliability of the Ukrainian version of the questionnaire for assessing HRQoL in children with hemophilia, Haemo-QoL; determine the suitability for using this version of the questionnaire in terms of the reliability in time in the new ethno-linguistic environment.

Material and Method 1. Assessment of HRQoL
The ethno-linguistically adapted pilot Ukrainian long version of the questionnaire was used for Haemo-QoL assessment. The questionnaire has six versions: three versions of the questionnaire for children allowing for age groups: the Ic version -for children aged 4-7 years, ІІc -8-12 years, ІІІc -13-16 years; three analogous versions of the questionnaire for parents of children of the corresponding age groups (Ip, ІІp, ІІIp). The versions for children of different age groups differed both in terms of the number of scales of the questionnaire and the number of questions in the corresponding scales. The number of questions in the scales varies from 2 to 10 questions. The structural load of the questionnaire is the following: Ic/p -8 scales/21 questions, IIc/p -10 scales / 64 questions, IIIc/p -12 scales/77 questions. The study includes the results of 64 forms with answers of respondents-participants of the primary questionnaire and 64 forms with answers of respondents-participants of the repeated questionnaire. Scores for the scale were taken into account provided more than 50 % of answers to the statement. The HRQoL obtained by counting the scores of the questionnaire is presented in the form of a transmuted scale score (hereinafter -"TSS") according to the well-known formula [19]. At the same time, higher TSS values indicate a worse quality of life, and smaller ones -a better one, with an interval of acceptable values from "0" to "100" scores, where "0" is the best HRQoL, and "100" is the worst HRQoL. Total HRQoL for each respondent is determined as the mean of scores (TSS) of all scales of the questionnaire. Taking into account that the majority of indicators for the characteristics of distribution differed from normal, the results of the study were presented as Me (25 %, 75 %), where Me is median, 25 % -1 st quartile (25 th percentile), 75 % -third quartile (75 th percentile).

2. Procedure
A study of the reliability of the Ukrainian version of the Haemo-QoL questionnaire was conducted on the basis of the Children Thrombosis and Hemostasis Center in of the Western Ukrainian Specialized Children's Medical Center, Lviv. Reliability in time of the questionnaire was determined using the test-retest method by determining the correlation between the results of the primary questioning and re-questioning of 64 respondents participating in the study. Duration of the study was 2 months. Sex and age as well as clinical and nosological structure of respondents are presented in Table 1. The questioning was conducted by the correspondent method. Joint hemorrhages were registered using the general clinical method, the method of retrospective analysis of patients' clinical records, ultrasonography (US) of target joints. All participants in the study were fluent in Ukrainian, were well informed about the purpose and the course of the study.

Criteria of selection in the study
Prior to repeated HRQoL questioning, all respondents had to answer the question on the cover of the questionnaire: "How has your health condition changed over the last month?" with the choice options: "no changes", "improved", "became worse". The criteria for inclusion in the study were as follows: -Children with hemophilia A (factor VIII concentration in plasma <30 IU/dl), aged 4 to 16 years, and their parents (one parent).
-Patients who answered "no changes" to the above question.
-Lack of dynamic changes in the clinical characteristics of the disease in the repeated questioning of patients surveyed in the primary questioning. Hemorrhages in the joints for 4 weeks before the re-questioning were chosen as indicators of changes in the clinical characteristics. To identify the subjective severity of joint hemorrhages, the respondents' answers to the following question of the questionnaire were evaluated: "How much have you been disturbed by hemorrhages during the last 4 weeks?" Patients should have been assessed for the choice options for respondents Ic: "not at all", "somewhat", "very much", for IIc, ІІІс: "not at all", "somewhat", "moderately", "quite a bit", for Ip, IIp, IIIp: "not at all", "somewhat", "moderately", "quite a bit", "very much". According to answer to this question, the criterion of inclusion in the study was the same answer in the primary and repeated questionnaires. Some respondents who did not respond identically, cognitive debriefing was conducted to determine the understanding and objectivity of such answers. According to the indicator of joint hemorrhages, the structure of the respondents was as follows: 44 respondents did not have any hemorrhages in the primary and repeated questionnaires (22 children and 22 parents of the corresponding children), 20 had joint re-hemorrhages (1-2 hemorrhages between the examinations) in the repeated questioning (10 children with hemorrhages and their 10 parents), however, indicated a subjective assessment of their severity in the questionnaire, similar to that given in the primary survey.
-Lack of psychosocial factors that could affect the objectivity of answers. Such factors include: divorce of the patient's parents, relocation, death of relatives, change of educational institution and others that could potentially be found out in an individual interview. To this end, the parents of patients were required to fill in the appropriate item in the supplementary form to the questionnaire on the availability or absence of such factors. Such information was also collected through personal interviewing.

4. Statistical data processing
For the purpose of statistical research, a preliminary analysis of the groups was conducted for the normality of distribution, using the analysis of excess and asymmetry, Kolmogorov-Smirnov and Shapiro-Vilka tests, as well as distribution alignment chart and histogram analysis. For correlation analysis, the Pearson correlation coefficient for normal distribution data and the Spearman's rank correlation coefficient for data that differed from the normal, nonparametric Wilcoxon's W criterion for related groups were used. Any difference with P value <0.05 was considered statistically significant. To determine the standardized effect size (ES) of the difference between the HRQoL of the primary questioning and re-questioning the Cohen's d ES for nonparametric data was used. The obtained Cohen's d value was further estimated by a scale of 0.1 to 0.3 as a "slight difference", 0.3 to 0.5 as an "average difference", 0.5 and higher -a "big difference" [20,21].

Results
The results of evaluation of HRQoL in the primary (hereinafter -"TSS1") and repeated (hereinafter -"TSS2") survey for all respondents without age grouping are presented in Table 2.
As we can see from this table, we evaluated the reliability of the difference between TSS1 and TSS2 both for each scale separately and by the overall HRQoL. A separate comparison of the results of TSS1 and TSS2 was carried out for each of the six versions of the questionnaire (the result is presented in Table 3).

1. Relation of the total HRQoL between the surveys of all respondents
Total HRQoL of all groups in the primary survey was 32.69 (18.01, 39.81), in the repeated survey the total HRQoL slightly increased to 32.94 (19.68; 39.68). At the same time, the increase in the indicator was statistically reliable (p=0.0013, W=478). Taking into account the statistically reliable difference of the total HRQoL between TSS1 and TSS2, the difference of medians for these indicators is also estimated to be only 0.25, which is 0.77±1.09 % from the value of TSS1. The mean absolute value of difference variation of the total HRQoL between TSS1 and TSS2 is 1.67±1.51, which is 5.42±2.83 % of the total HRQoL value for the primary survey, the minimum difference value =0, the maximum is 8.33. The value of the difference in each patient separately for the total HRQoL between the primary and the repeated surbey to the value of TSS1 ("total HRQoL difference, %") is 4.95 (1.89; 9.46) %, and its maximum value in some cases is 66.67 %.

2. Relation of HRQoL scales of the questionnaire between surveys of all respondents
In the analysis of the difference between the values of TSS1 and TSS2, the statistically reliable difference was not found on all scales of the questionnaire, except for the "Family" scale (p=0.014, W=201) and "Others" (p=0.0097, W=53), in which the difference between the primary and secondary questionnaires was statistically reliable. In this case, we conducted paired comparisons without differentiation of respondents into groups by age and versions of the questionnaire.

3. Relation of the total HRQoL between surveys by age groups
Comparing total HRQoL in three age groups (Ic/p, IIc/p, IIIc/p), no statistically reliable difference between TSS 1 and TSS 2 groups Ic/p (p=0.085, t=-1.81) and IIIc/p (p=0.47, t=-0.75) was found, whereas in the group IIc/p, in the primary questionnaire, the total HRQoL was 28.98±13.59 and in the repeated questionnaire 30.00±13.35, and this increase was statistically reliable (p=0.0016, t=-3.58). At the same time, the mean difference is 1.02 and Cohen's d ES is only 0.08.

Relation of the total HRQoL between the surveys in the parent/children questionnaire versions
Comparing the difference between the total HRQoL of the parent questionnaire versions (Ip, IIp, IIIp) and children versions separately (Ip, IIp, IIIp), without division into corresponding age groups, it was found that the indicators obtained in the analysis of parent questionnaire version in the primary and repeated questionnaires were as close as possible and no significant difference was found (p=0.28, t=-1.1). At the same time, the total HRQoL in the group of children in the repeated questionnaire was slightly different from the results of the primary questionnaire, and this difference was statistically reliable (p=0.0016, t=-3.45). It should be noted that for these groups the mean difference is 1.32, and the Cohen's d ES is only 0.08.

5. Relation of the total HRQoL between surveys in each group
We also compared the results of survey by the total HRQoL for each of the six versions of the questionnaire separately. For this purpose, the children and their parents were divided into age groups in accordance with questionnaire versions (6 groups). This was done in order to characterize the test-retest reliability of the age version of the questionnaire. The results are presented in Table 3.

-probability null hypothesis between the TSS 1 and TSS2, t -t-test
As can be seen from Table 3, the results of parents for all versions of the questionnaire showed no statistically reliable difference between the primary and secondary questionnaires. The mean difference in this case was from 0.15 to 0.56. As for the results of the HRQoL evaluation in children's questionnaires, no statistically reliable difference was found only in the oldest children (version IIIc) with a mean difference of only 0.10. With regard to Ic and IIc versions, the reliable difference of the total HRQoL between TSS1 and TSS2 is statistically verified. In addition, it should be noted that the mean difference for Ic and IIc versions was 2.00 and 1.6, respectively.

Discussion
Before discussing the results obtained in the HRQoL survey in 32 patients with hemophilia A and their 32 parents the following should be noted. To date, there are no, and probably there can be no, single criteria for evaluating the results of the TRR survey. The investigated psychological component of hemophilia and its influence on the feeling of own well-being, is dynamic and labile in itself, and it is impossible to exclude all factors that can affect it, and therefore completely isolate the results of the primary and repeated questionnaires according to inclusion criteria. Therefore, the existing difference between the results of the primary and the repeated questionnaires does not always indicate the lack of reliability of the questionnaire. A kind of "residual" sensitivity of this questionnaire to factors that are not excluded and not taken into account in the study may have an impact, too. Therefore, the conclusion on the degree of reliability of the questionnaire is made by the method of complex data analysis, including internal consistency. To assess the ability of this questionnaire to reflect accurate and constant results as objectively as possible, we also tried to determine the difference between the indicators and the "size effect" of this difference, differentiated the indicators by the scales under study. It should be noted that we also investigated the internal consistency of the Ukrainian version of Haemo-QoL, but the results will be presented in another publication. The intervals between the surveys should also be taken into account in TRR studies. In the study of TRR by S. von Mackensen et al. [38] when developing and implementing the Haemo-QoL questionnaire, the interval between the surveys was 1 week, in a study by E. Pollak et al.
[34] -1-2 weeks, while in our survey it varied from 4 to 6 weeks. When studying TRR, time intervals between surveys should always be taken into account and analyzed when drawing conclusions about the reliability of the questionnaire because of the so-called "memory effect". In the study of reliability, as Oladimeji A. Bolarinwa noted, '... the time period is long enough yet short in time that respondents' memories of taking the test at time 1 do not affect their scores at time 2 and subsequent test administrations ...' [26]. However, there are no optimal commonly accepted boundaries for time between surveys to determine reliability in time so far.
When comparing the HRQoL indicators for each scale of the questionnaire, it was found that similar results were obtained for most scales, however, for the "Family" and "Others" scales the difference between the primary and secondary questionnaires was found, but it was negligible (3.12 for "Family" and 4.16 for "Others"). This is likely to show a greater degree of sensitivity of these scales to the factors that are not taken into account (and therefore not excluded) in the study, or low reliability of the inclusion criteria used in this study. In general, taking into account the size of the difference in indicators, such data can be interpreted as sufficient reliability for all scales of the questionnaire. Analyzing the total HRQoL, its increase was found in the repeated questionnaire. However, a small median difference (0.25) makes it possible to interpret these indicators as similar, especially mentioning the range of possible indicators from 0 to 100. The most similar indicators were found in the 1 st and 3 rd age group, while in 2 nd they differed between the surveys. But, again, given the small mean difference and less than the "slight difference" of Cohen's d ES, data between the surveys can be considered similar. Interestingly, comparing the TRR in parents and children, it has been found that parents give more similar scores than children, which may indicate a greater reliability of the parent questionnaire. This also applies to analysis in different age groups. For all versions of the questionnaire for parents (Ip, IIp, and ІІIp) there was no significant difference in the indicators of the quality of life between the surveys. These data are similar to those of E. Pollak et al. [34], in which the higher reliability of parental forms was demonstrated, but the conclusion was made on the basis of an analysis of internal consistency. The TRR study was conducted by the author only in age groups II and III where a lower intra-class correlation coefficient in parent versions between surveys is demonstrated. At the same time, in our study, versions for children showed lower reliability (with the exception of older children aged 13-16, where no statistical difference was found between the surveys), however, the minimum difference between the surveys and a "slight difference" in Cohen's d ES allows us to assert that the indicators obtained in these groups are similar. It is difficult to compare data obtained with known researches due to a number of reasons, in particular due to the difference in age groups included in the study, the methods used and often the complex reliability profile. This comparison can be made after a complete psychometric analysis of this questionnaire.
It is also worth noting that in most studies of validating questionnaires, the authors make conclusion on the reliability of the newly created versions solely on the basis of an analysis of internal consistency. In our opinion, such a conclusion can be only complex, taking into account both the internal consistency and test-retest stability in time.

Conclusions
Taking into account that the main objective of this study was to investigate the test-retest reliability of the Ukrainian long version of the questionnaire to evaluate the HRQoL of children with haemophilia, Haemo-QoL, we can draw the following conclusions: 1. It is proved that the Ukrainian version of the Haemo-QoL questionnaire has sufficient reliability to changes in time to quantitatively reflect the change in HRQoL in patients with hemophilia A.
2. For a comprehensive description of the reliability of this version of the questionnaire, it is necessary to conduct a study of internal stability.