Evaluation of candidate reference genes for quantitative real-time PCR analysis in a male rat model of dietary iron deficiency

Background Quantitative real-time polymerase chain reaction (qPCR) is a reliable and efficient method for quantitation of gene expression. Due to the increased use of qPCR in examining nutrient-gene interactions, it is important to examine, develop, and utilize standardized approaches for data analyses and interpretation. A common method used to normalize expression data involves the use of reference genes (RG) to determine relative mRNA abundance. When calculating the relative abundance, the selection of RG can influence experimental results and has the potential to skew data interpretation. Although common RG may be used for normalization, often little consideration is given to the suitability of RG selection for an experimental condition or between various tissue or cell types. In the current study, we examined the stability of gene expression using BestKeeper, comparative delta quantitation cycle, NormFinder, and RefFinder in a variety of tissues obtained from iron-deficient and pair-fed iron-replete rats to determine the optimal selection among ten candidate RG. Results Our results suggest that several commonly used RG (e.g., Actb and Gapdh) exhibit less stability compared to other candidate RG (e.g., Rpl19 and Rps29) in both iron-deficient and iron-replete pair-fed conditions. For all evaluated RG, Tfrc expression significantly increased in iron-deficient animal livers compared to the iron-replete pair-fed controls; however, the relative induction varied nearly 4-fold between the most suitable (Rpl19) and least suitable (Gapdh) RG. Conclusion These results indicate the selection and use of RG should be empirically determined and RG selection may vary across experimental conditions and biological tissues.


Background
Iron is an essential nutrient and is involved in many mammalian processes including DNA synthesis, erythropoiesis, ATP production, and oxygen transport [1,2]. In humans, iron deficiency (ID) remains the most common single nutrient deficiency and affects approximately 25% of the world's population or 1.62 billion people according to the World Health Organization [3]. Due to its importance in biological functions, inadequate levels of iron lead to microcytic anemia, diminished cognitive development, and decreased ATP production [1,4].
A variety of biomarkers and methodologies exist to investigate iron status; for example, measuring serum ferritin and transferrin saturation are common practices and often employed together to enhance the detection of systemic ID [5]. To investigate the iron content of biological tissues, inductively coupled plasma mass spectrometry is a useful strategy due to its low detection limits [6]. In many instances, however, indirect measures are needed to further understand iron homeostasis. In these instances, the addition of immunoblotting, quantitative real-time PCR (qPCR), and iron regulatory protein (IRP) RNA-binding assays can be utilized to determine the abundance of proteins such as ferritin and transferrin receptor [7], the gene expression of mRNA encoding proteins such as transferrin receptor or hepcidin [8], and IRP-binding activity, respectively [9]. Of these approaches, qPCR has become the gold standard for evaluating gene expression due to its sensitivity, accuracy, and simplicity [10,11]. Therefore, fully understanding this technique and standardizing the methods, along with analyzing and interpreting qPCR results, are of great importance.
To compare differences in gene expression (i.e., mRNA abundance) between experimental groups, qPCR is used by applying Kary Mullis' novel method of amplifying DNA and using probe-based chemistries [11][12][13]. Following exposure to experimental conditions, there are 4 major steps to successfully complete qPCR: (1) harvest quality RNA from experimental groups, (2) reverse transcribe RNA templates into complementary DNA (cDNA), (3) amplify cDNA with probe-based chemistries by qPCR, and (4) quantify relative mRNA abundance. First, it is essential that RNA integrity is maintained during isolation and purification as poor quality RNA may compromise experimental results [14]. Second, since qPCR amplifies only DNA by taking advantage of DNA polymerases, the quality RNA must be reverse transcribed into cDNA by the enzyme reverse transcriptase [11]. The third step, amplification of the cDNA, utilizes fluorescence-based molecules that bind to DNA and fluoresce following excitation. As each qPCR cycle is repeated, new copies of the cDNA template are generated. Thus, the fluorescence signal is directly proportional to the abundance of DNA. Finally, to quantify relative mRNA abundance, it is important to control for sample-to-sample variation by one of two strategies, the standard curve method or the comparative CT method. Both methods amplify a target gene and a control gene. The standard curve method expresses the relative mRNA abundance to the standard curve of the calibrator (i.e., untreated sample) and the comparative CT method normalizes the threshold cycle values of the target gene to the control gene before comparing experimental groups.
The normalization process accounts for discrepancies in RNA isolation, reverse transcription, and qPCR [15,16]. Normalization utilizes invariant control genes that are typically referred to as "housekeeping" or "reference" genes (RG) [17]. Ideally, RG have minimal variation in tissue or cell type and under different experimental conditions; thus, RG are considered stable. Interestingly, many RG have been reported to be regulated by experimental conditions or tissue type [18] and subsequently influence gene expression interpretation [15].
To date, there are many studies on RG selection for a number of animal and cell models [19][20][21][22]; however, there is limited data regarding RG selection in animal models of dietary conditions. Consequently, the absence of a systematic approach to RG selection makes gene expression data potentially difficult to interpret and compare between studies, and therefore less reliable. For instance, Suzuki et al. [23] reported Gapdh and Actb were used as RG in more than 60% of articles they reviewed in high-impact journals. While these RG may have been appropriate in those studies, both are affected by hypoxia and cell cycle maturation [24,25], and tissue type [26]. Furthermore, a commonly used iron-chelating reagent, desferrioxamine, is considered a hypoxiamimetic which may also regulate these RG. Some progress has been made in terms of RG selection in certain models, though the extent to which these results can be applied to all models remains unclear [16,21]. The focus of this study was to examine RG stability in a weanling rat model of dietary ID and to determine appropriate RG for use in qPCR. Additionally, the extent to which these RG were responsive to dietary ID was assessed. We examined the stability of gene expression in ten commonly used RG in qPCR (Actb, Gapdh, Hprt, Ppia, Rpl19, Rpl22, Rpl27, Rplp0, Rps29, and Tbp) for their candidacy to be used when comparing iron-deficient and iron-replete pair-fed (PF) rat experimental conditions. RG stability was also determined for individual tissues including the gastrocnemius, heart, kidney, liver, lung, and spleen under the same experimental conditions. Using four RG computational programs (BestKeeper, comparative delta quantification cycle (ΔCq), NormFinder, and RefFinder), we analyzed the gene stability to predict the most suitable RG for studying the effects of dietary ID on the regulation of gene expression [2,20,22,27].

Results
Animal anthropometric data and iron status measurements throughout the study are published elsewhere [28]. In summary, the ID group exhibited greater than 50% reduction in hemoglobin, hematocrit, and serum iron levels compared to both the control (C) and PF groups. ID animals weighed~20% less than the C group; therefore, to control for the total diet consumed, the PF group was fed an iron-sufficient diet to the level of the ID group's daily consumption. Importantly, there were no differences in final body weight or rate of weight gain between PF and ID groups. These results are consistent with previous findings indicating that ID animals exhibit decreased food intake and lower body weight compared to C animals [29]. All RG analyses were made utilizing the PF group instead of the C group to account for any non-specific changes due to unequal food intake.

BestKeeper analysis
BestKeeper software analysis ranks RG based on a pairwise correlation and then calculates the most suitable RG based on geometric means assessing crossing points (CP) or Cq. Among potential RG examined, if the criterion (SDCq value < 1.0) was met, RG were considered suitable for qPCR normalization [27]. Interestingly, when analyzing RG in both experimental groups (PF and ID) in individual tissue, all RG except one exhibited stability based on the criteria (data not shown). Rplp0 failed to meet the criteria in heart tissue (SDCq = 1. 2). After analyzing each experimental group individually using an all tissue combined approach, BestKeeper analyses indicated high variation in RG expression in the PF group with only Hprt meeting the criteria (Table 1) and moderate variation in the ID group with five of the candidate genes Hprt, Rps29, Tbp, Rpl19, and Rplp0 having an SDCq value < 1.0 (Table 2). Finally, when combining datasets from all tissues and both experimental groups to determine which RG exhibits the least amount of variability, Hprt and Rpl19 displayed the most stability (Table 3). Interestingly, two commonly used RG in the rat model of ID and other nutrition models, Actb and Gapdh, exhibited poor stability with Actb having the least stability in all BestKeeper analyses [30][31][32].

Comparative ΔCq analysis
Gene expression levels were analyzed for stability using the comparative ΔCq method and standard deviations (SD) [22]. Pairwise comparisons were utilized to determine ΔCq of the relative gene expression within individual tissues and in all tissues combined. Mean ΔCq and SD were then averaged to interpret RG stability values for each experimental condition individually (PF and ID) and combined experimental conditions stability among all tissues. Similar to Silver et al.'s results, certain genes exhibited increased or decreased levels of deviation in ΔCq among all tissues and experimental condition analyses [22]. Those genes calculated to have the lowest mean SD were interpreted as having the most stability as a RG. After examining treatment conditions separately, Rpl22 and Hprt exhibited the most stability in PF animals and Rpl19 and Ppia exhibited the most stability in ID animals. Finally, when combining datasets from each tissue and both experimental groups to determine which RG exhibits the most stability, Rpl19 and Actb had the lowest mean SD and therefore the most stability, while Rplp0 and Ppia had the highest SD or least stability ( Table 4).

NormFinder analysis
In contrast to the BestKeeper software, NormFinder determines suitability of RG as a function of variability. NormFinder software ranks potential RG using a model-based approach. The methodology examines sample subgroups (PF and ID herein), disparity in intra-and intergroup expression, and from these data calculates a stability value for candidate RG [2]. RG were assessed first in each tissue individually and then in all tissues combined to determine appropriateness of a single RG for use in all tissues. Among individual tissues, the most stable RG were Rps29 in the heart, Tbp in the kidney and lung, Rpl27 in the liver, and Ppia in the gastrocnemius and spleen. Exhibiting the least stability, Actb ranked poorly in nearly all tissues ( Fig. 1A-F). After combining data from the six individual tissues, Rps29 and Rpl27 were identified as the most stable RG and Hprt and Gapdh as the least stable RG using Norm-Finder (Fig. 2). Geometric mean (CP), coefficient of variance (CV), and standard deviation (± CP) of the Cq values for putative reference genes. RG are ordered from left to right according to their SD Cq value . Reference genes with a SD Cq value < 1.0 are considered to be an appropriate reference gene when assessing gene expression in the pair-fed animals. To determine the under-and over-expression of a reference gene relative to the gene's geometric mean (x-fold), the min, max, and standard deviation are calculated by the BestKeeper software

RefFinder analysis
RefFinder is a software program that utilizes multiple established algorithms (BestKeeper, ΔCq, geNorm, and NormFinder) to calculate a comprehensive RG stability value [33]. Each gene is assigned a weight based on each algorithm's geometric mean and weights are then combined to conclude the overall RefFinder ranking. In the individual tissues, the most stable RG were Hprt in the heart, Rps29 in the kidney, Rplp0 in the lung, Rpl27 in the liver, and Ppia in the gastrocnemius and spleen ( Fig.  3A-F). After combining the six tissues, Rpl19 and Rps29 were identified as the most stable RG and Ppia and Gapdh as the least stable ( Fig. 4). Interestingly, when experimental conditions were analyzed separately (PF or ID) and combined (PF and ID), Actb, Ppia, and Gapdh all were ranked in the bottom half respectively (data not shown). Results from BestKeeper, comparative delta Cq, Norm-Finder, and RefFinder computational programs were organized to develop a relative overall ranking. The ranking was based on PF and ID experimental groups and all tissues combined. The top two candidates (in rank order of most suitable to least suitable) were Rpl19 and Rps29. The least suitable candidate was Gapdh, with Actb, Ppia, and Rplp0 ranking second in a three-way tie ( Table 5). The same analyses were completed for the top two candidates or most suitable genes in each specific tissue (Table 6).
Lastly, to compare the impact of RG on target gene abundance and the interpretation of data, Tfrc gene expression in the liver of PF and ID animals was examined. Using the two best RG based on the overall ranking, (Rpl19 and Rps29), and two commonly used genes that ranked poorly in our analyses (Gapdh and Ppia), the relative abundance of Tfrc mRNA was determined using the ddCt method [34]. Although Tfrc expression increased significantly in ID animals regardless of the RG utilized, the relative fold changes varied (  Geometric mean (CP), coefficient of variance (CV), and standard deviation (± CP) of the Cq values for putative reference genes. RG are ordered from left to right according to their SD Cq value . Reference genes with a SD Cq value < 1.0 are considered to be an appropriate reference gene when assessing gene expression in the iron-deficient animals. To determine the under-and over-expression of a reference gene relative to the gene's geometric mean (x-fold), the min, max, and standard deviation are calculated by the BestKeeper software  0.05). For normalization using Rpl19 and Rps29 as RG, Tfrc mRNA abundance increased 10-fold and 8-fold in ID animals, respectively (Fig. 5). In contrast to the best ranking RG, using Gapdh and Ppia as RG, Tfrc mRNA abundance increased 6-fold and 7-fold, respectively (Fig.  5).

Discussion
The necessity for ensuring suitable RG in qPCR quantitation has been well recognized [2,20,26,35,36]. Ideal RG should exhibit minimal variation in expression levels among various tissues and under experimental manipulations [35]. The existence of an ideal RG is, however, uncertain at best. To date, limited data has been published on gene expression analyses with nutrient-gene interactions in animal models [37,38] and to our knowledge, an empirical determination of appropriate RG selection in the male weanling rat model of iron deficiency has not been conducted. Additionally, the extent to which RG vary among specific tissues in the same nutritional model has not been examined.
This study was designed to evaluate variation in gene expression in ten commonly used RG in varying dietary (PF and ID) conditions and to identify the RG most suitable for iron deficiency analyses utilizing qPCR in gastrocnemius, heart, kidney, liver, lung, and spleen tissues. Our data is consistent with other research and suggests that commonly used RG may be regulated under experimental conditions and expression stability varies between tissues [26]. It is evident that appropriateness of a RG is likely dependent on the tissue of interest in which gene expression is being analyzed. For example, Gapdh ranked poorly in the majority of the computational programs, both in individual tissues and when combining all tissues for analysis. However, NormFinder and RefFinder data concluded Gapdh had increased stability and was ranked in the top 3 RG in skeletal muscle. Another example of RG inconsistency based on tissue occurred with Rplp0; although Rplp0 is one of the least stable RG in the liver, it is the most stable RG in the lung based on the RefFinder results. These tissue differences were reflected in poor overall ranking when all tissues were combined for analysis. In contrast, both Rpl19 and Rps29 were relatively stable in all tissues resulting in a high overall ranking as determined by all software analyses and our combined overall ranking system. Thus, it is evident that appropriateness of a RG is likely dependent on the tissue of interest in which gene expression is being analyzed, and when comparing multiple tissues simultaneously, it is important RG exhibit relative stability across all tissues. The RG selected herein have diverse biological functions and origination. The RG can be categorized as encoding for (1) ribosomal proteins, (2) structural proteins, or (3) enzymatic proteins (Table 7). Based on our results, rRNA (Rpl19 and Rps29) are the most stable and highest ranking RG for the weanling model of iron deficiency. Although ribosomal RG tend to be more stable in our study, it is important to understand the limitations of using rRNA as RG. First, synthesis of rRNA (RNA polymerase I) and mRNA (RNA polymerase II) are independent, and for that reason, it is thought to be controversial to choose a RG whose transcription is not regulated in the same manner [39]. Second, if original RNA samples were enriched for mRNA, rRNA would be excluded from the isolation process making it an inappropriate control [16]. The samples used in this study were not enriched for any RNA species as evidenced by agarose gel. Next, according to Derveaux et al. [40], it is important to select RG with a similar abundance level to the target mRNA (or gene), making rRNA unsuitable since they are expressed at much higher levels than mRNA. Finally, like mRNA, rRNA have been reported to be regulated under some experimental conditions [20,41,42].
The use of computational programs for determination of the most suitable RG assumes consistent gene expression profiles between experimental groups. Our study, Fig. 4 Comprehensive stability ranking of RG including both experimental conditions. Rankings were determined using RefFinder. Ranking of RG was based on a combined analysis of gene expression in the gastrocnemius, heart, kidney, liver, lung, and spleen. Values were plotted based on ranking number; most stable [1] and least stable [10]  The two most stable and two least stable RG from BestKeeper, comparative ΔCq, NormFinder, and RefFinder were combined to provide an overall ranking of PF and ID experimental groups in all tissues. Overall ranking was determined by quantifying the most stable or least stable genes of all programs consistent with other studies, shows similar results in overall ranking between all computational programs for some genes [43,44]. This could be due to overlap in computational programs as RefFinder uses a combination of NormFinder, BestKeeper, Delta CT, and Genorm. We did not give more weight to any one program to take an unbiased approach to the analyses. Substantial variation was exhibited by some of the RG under examination. For instance, when analyzing all tissues together in both PF and ID animals, Actb ranked as the second most stable gene with the ΔCq method, but then ranked in the bottom half of all genes with BestKeeper, Norm-Finder, and RefFinder. Thus, this type of result supports a more robust approach to RG selection. Despite some The two most stable RG from BestKeeper, comparative ΔCq, NormFinder, and RefFinder were combined to provide an overall ranking of PF and ID experimental groups in gastrocnemius, heart, kidney, liver, lung, and spleen. Overall ranking for each tissue was determined by quantifying the most stable genes of all programs. *Indicates equal stability ranking similarities between computational program results, small differences in RG stability do exist and could lead to unreliable data interpretation. For instance, when liver Tfrc mRNA abundance levels were normalized to the most stable RG (Rpl19 and Rps29) and the least stable RG (Actb and Gapdh), as determined by our overall ranking system, Tfrc mRNA abundance was significantly increased in the ID animals based on all four RG; however, the magnitude of the differences varied. A significant increase in Tfrc mRNA abundance in response to dietary iron deficiency has been well established [45,46], however in studies aiming to evaluate target mRNA that result in marginal mRNA abundance changes, a significance may not be detected. Therefore, it may be necessary to use multiple computational programs when determining the most stable RG for nutrient-gene interaction-focused studies. Additionally, as suggested by Bustin et al. [15], using more than one RG for normalization and choosing the top ranked RG based on the use of multiple computational programs is likely the superior comprehensive approach investigators should use for mRNA normalization.

Conclusions
Small changes in gene expression may be misinterpreted if an appropriate RG is not selected. Therefore, it may be inappropriate to choose RG for a study based solely on previous research or literature reviews instead of taking an empirical approach to identifying the most suitable RG. To our knowledge, this is the first study to examine RG stability for qPCR gene expression analyses focused on dietary condition and tissue type. Based on the ten-selected RG, Rpl19 and Rps29 are the most suitable RG for normalization studies involving gastrocnemius, heart, kidney, liver, lung, and spleen tissues in studies focused on the male weanling model of dietary iron deficiency. The combined ranking system provides a more appropriate evaluation of RG suitability because it provides a thorough assessment of overall RG stability based on four accepted computational RG programs. The model illustrated herein provides an appropriate method for validation of RG, specifically for studies involving dietary responses in multiple tissues, and should be implemented prior to qPCR assays in order to report valid and reliable results.

Methods
Twenty-one-day-old weanling male Sprague-Dawley® (Harlan, Indianapolis, IN USA) rats (n = 24) were housed individually in stainless steel, wire-bottomed cages in an environmentally controlled facility and maintained on a 12-h light: dark cycle at 20°C with ad libitum access to deionized water. All rats were allowed access to the control diet for 3 days prior to starting dietary treatments. Following the acclimation period, animals were randomly assigned to one of three diet groups (n = 8 per group) for 21 days: control iron-replete (C; 40 mg Fe/kg diet), pair-fed ironreplete (PF; control diet with the mean intake of the ID group) or iron-deficient (ID; < 5 mg Fe/kg diet). Diets were purchased from Harlan Teklad (ENVIGO, Madison, WI, USA; C-TD.89300 and ID-TD.80396) based on the recommendations from the American Institute of Nutrition's 1976 (AIN 76) Standards for Nutritional Studies. Individual body weights and food intake were measured daily. Following the 21-day experimental period, animals were anesthetized with a ketamine/xylazine mixture and were sacrificed by exsanguination via the abdominal aorta. The gastrocnemius, heart, kidney, liver, lung, and spleen were snap-frozen in liquid nitrogen immediately following removal and stored at -80°C until subsequent analysis. All institutional guidelines for the care and use of laboratory animals were followed and approved by the OSU Institutional Animal Care and Use Committee.

Quantitative qPCR and data analysis
Gene expression was determined by qPCR using SYBR Green chemistry on an ABI 7900HT sequence detection system instrument and 2.4 SDS software (Applied Biosystems, Foster City, CA, USA). All reactions were performed in 10 μL volumes, including 50 ng of template, 2.5 μM of each forward and reverse primer, and 10 mM of dNTPs (2.5 mM each). Amplification was performed with a 2 min activation step at 50°C, 10 min denaturation step at 95°C, followed by 40 cycles of 90°C for 15 s and 60°C for 1 min. Following each cycle, a dissociation curve analysis was performed using the default settings of the software to confirm the specificity of the PCR products. For each target RG, the relative stability was assessed using BestKeeper, the comparative delta Cq (ΔCq) method, NormFinder, and RefFinder. RG were assessed in individual tissues and based on all tissues combined. They were assessed between experimental conditions (PF and ID) based on all tissues combined. Potential RG analyzed included Actb, Gapdh, Hprt, Ppia, Rpl19, Rpl22, Rpl27, Rplp0, Rps29, and Tbp ( Table 7). The overall rankings were determined by using all tissue combined analyses or each specific tissue for each stability program. The two most stable and two least stable RG for each program were quantified based on the number of times they were ranked at the top or bottom. The two most stable and two least stable genes from the all tissues ranking were further used as RG to compare Tfrc gene expression in PF versus ID rat livers. The comparative ΔΔCq method was used to analyze mRNA abundance [47]. Oligonucleotide primers (Table  8) were obtained from Integrated DNA Technologies (IDT, Coralville, IA, USA) and designed using Primer Express software 3.0.1 (Applied Biosystems, Foster City, CA, USA). Briefly, nucleotide sequences were acquired from NCBI and primers were designed to cross exons, not exceed an amplicon length of 100 nucleotides, and have the lowest possible error rate.

Statistical analysis
Analyses were performed using SPSS version 23.0 software (IBM Corp., Armonk, NY, USA). Statistical analyses using Student's t test were performed to determine treatment effects of dietary condition (ID or PF). Values are expressed as means ± standard error of the mean (SEM).