Copy number variations in response to chronic pollution: Basilichthys microlepidotus in central Chile

Jorge Cortés-Miranda1, David Veliz1, Ciro Rico2 and Caren Vega-Retter1

PDF: EN    XML: EN | Supplementary: S1 | Cite this article

Abstract​


EN

Pollution, driven by land use, industrial operations, and urban growth, significantly affects biodiversity in freshwater ecosystems. Studies have shown how freshwater organisms adapt to pollution, observing mechanisms like directional selection, balancing selection, and introgression. They have focused on genetic changes in populations exposed to pollution, particularly single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) in DNA. CNVs have been linked to environmental disturbances. This study investigates CNVs in Basilichthys microlepidotus in Chile’s polluted Maipo River watershed. CNVs were associated with pollution in chronically exposed populations, though population structure was weak, making it difficult to distinguish between reference and contaminated sites. However, outliers related to pollution functions were consistently identified. Eleven CNV loci correlated with three historical physical variables electroconductivity, pH, and total dissolved solids accounting for 5% of all detected CNV loci. These markers revealed a subtle but significant population structure, linking CNVs to gene expression changes and SNPs potentially affected by pollution-driven selection. The effects of these CNVs are unknown, and further analysis is required to unveil them, but they could potentially help these organisms adapt to environmental contamination.

Keywords: Contamination, Freshwater, Genomics, Population genetics, Structural variants.

ES

La contaminación, impulsada por el uso del suelo, las operaciones industriales y el crecimiento urbano, afecta significativamente la biodiversidad en los ecosistemas de agua dulce. Estudios han mostrado cómo los organismos de agua dulce se adaptan a la contaminación, observando mecanismos como la selección direccional, la selección balanceadora y la introgresión. Se han centrado en cambios genéticos en poblaciones expuestas a contaminación, particularmente polimorfismos de nucleótido único (SNPs) y variantes del número de copias (CNVs) en ADN. Las CNVs se han relacionado con disturbios ambientales. Este estudio investiga las CNVs en Basilichthys microlepidotus en la cuenca contaminada del río Maipo en Chile. Las CNVs estuvieron asociadas con contaminación en poblaciones crónicamente expuestas, aunque la estructura poblacional fue débil, lo que dificultó distinguir entre los sitios de referencia y contaminados. Sin embargo, consistentemente se identificaron valores atípicos relacionados con funciones de contaminación. Once loci de CNV se correlacionaron con tres variables físicas históricas: electroconductividad, pH y sólidos disueltos totales, representando el 5% de todos los loci CNV detectados. Estos marcadores revelaron una estructura poblacional sutil pero significativa, vinculando CNVs con cambios en expresión génica y SNPs potencialmente afectados por la selección impulsada por contaminación. Se desconocen los efectos de estas CNVs, y se requiere análisis adicionales para revelarlos, pero potencialmente podrían ayudar a estos organismos a adaptarse a la contaminación ambiental.

Palabras clave: Agua dulce, Contaminación, Genética de poblaciones, Genómica, Variantes estructurales.

Introduction​


Pollution poses a significant threat to biodiversity and wellbeing (Sigmund et al., 2023). Land usage, industrial activity, and urban expansion are the main contributors to pollution (Ukaogo et al., 2020). Freshwater environments are significantly affected by pollution, making them some of the most impacted ecosystems (Mushtaq et al., 2020). Populations living in these contaminated systems face three potential outcomes: extinction, migration to avoid the pollution, or evolutionary adaptation (Oziolor et al., 2019).

Studies have extensively shown the ways in which animals adapt to pollution in freshwater habitats. For example, genetic changes have been extensively studied in populations exposed to pollution, primarily focusing on single nucleotide changes, or SNPs (Bélanger-Deschênes et al., 2013; Reitzel et al., 2014; Park et al., 2021). For example, the study conducted by Bélanger-Deschênes et al. (2013) revealed that SNPs in genes associated with the cell cycle can exhibit directional selection in populations of yellow perch Perca flavescens (Mitchil, 1814) exposed to cadmium and copper. Furthermore, in Basilichthys microlepidotus (Jenyns, 1841), a type of silverside fish that was subjected to pollutants originating from agricultural and residential sources, it was discovered that there were loci with high frequency in populations inhabiting polluted sites and loci with multiple variants in amplified length polymorphism (AFLP) markers of these fish (Vega-Retter et al., 2015).

Also, new studies have looked at changes in the structure of DNA that involve big differences in the number of base pairs. One example is Copy Number Variants (CNVs), which are changes of more than 50 bases in the number of copies of a region in the genome (Pös et al., 2021). The CNVs can encompass different genome regions, including introns, exons and whole genes, depending on the base pairs involved (Chain et al., 2014). Other studies have established a connection between CNVs and environmental disturbance. For example, Dorant et al. (2020) identified 48 CNVs linked to the yearly fluctuations in sea surface temperature in the lobster Homarus americanus Milne Edwards, 1837. Bazzicalupo et al. (2019) identified CNVs in genes responsible for zinc transportation in the fungus Suillus luteus Roussel, 1796. These variations were found in fungi exposed to soils with varying levels of pollution. The three-spined stickleback (Gasterosteus aculeatus Linnaeus, 1758) is a fish that lives in lakes and rivers. Researchers have discovered CNVs in some of its genes associated with environmental factors in those locations (Chain et al., 2014). Amazonian fishes exposed to pollution, primarily from urban development, dumps, and chemical discharges, provide another example. Most of the exposed fish populations showed changes in the copy number of 18S and 5S rDNA. This was linked to keeping the genome integrity (Silva et al., 2019). If a CNV covers a whole gene, it can have no effect or even change how the gene is expressed (Orozco et al., 2009). However, smaller CNVs, such as those found in introns or exons, can have different effects, such as altering the protein sequence (Chain et al., 2014).

Human activities heavily affect the Maipo River basin in Chile, home to around 40% of the country’s population (CENSO, 2017). The basin is also bordered by agricultural operations (Peña-Guerrero et al., 2020). There have been clear changes in the chemical and physical makeup of the basin over the past 17 years, with higher levels of many chemicals that are related to the farming activities and wastewater discharges that have caused the water quality to worsen (Wilkinson et al., 2022; Cortés-Miranda et al., 2024a; Soriano et al., 2024). For example, (Vega-Retter et al., 2014) considered a site in Isla de Maipo (IM) as a reference site, but (Cortés-Miranda et al., 2024a) revealed that the physical and chemical conditions changed, and now it is similar to polluted sites in the basin. This basin is home to the silverside B. microlepidotus, a vulnerable fish species. Lakes and rivers within the latitudes of 28°S and 39°S harbour B. microlepidotus populations (Véliz Baeza et al., 2012). Its reproductive period occurs from August to January and it has a generation time of one year (Comte, Vila, 1992). This omnivorous microphagous species feeds on insect larvae, small invertebrates, detritus, and filamentous algae (Duarte et al., 1971; Bahamondes et al., 1979). Baslichthys microlepidotus has undergone a thorough examination of its exposure to pollution. Its populations reside in several locations within the basin, each with varied levels of pollution (Vega-Retter et al., 2014; Veliz et al., 2020; Cortés-Miranda et al., 2024a). Studies have documented the presence of directional and balancing selection in this species in response to pollution (Vega-Retter et al., 2015, 2018, 2024; Veliz et al., 2020). In addition, Vega-Retter et al. (2024) have identified candidate loci that are under selection using SNPs. Recent studies found these loci in genes linked to pollution and showed that they play a part in many biological processes, such as controlling cell death and the immune system. Furthermore, these studies have found differential expression of genes in these loci. The chemical defensome and the transcriptome as a whole have also shown changes in gene expression. For instance, Cortés-Miranda et al. (2024b) observed a rise in biological processes associated with immune response and cell division, which they linked to pollution. Vega-Retter et al. (2018) and Veliz et al. (2020) also say that there might be a connection between selection and gene expression. This is especially true for the ornithine decarboxylase gene, which is an oncogene.

Changes in the structure of the genome that involve several base pairs (Mérot et al., 2020) have also been looked into, but we have not yet explored the role they might play in these gene expression patterns. The main goal of this study was to find pollution-related genetic differences and copy number variations (CNVs) in the B. microlepidotus that lives in the Maipo River watershed.

Material and methods


Sampling sites and sample collection. Four sampling sites were visited in the spring of 2019 in the Maipo River basin, Chile. Those sites were selected related to historical pollution data that classifies three sites as polluted: Melipilla (MEL, 33º42’49.98”S 71º12’39.13”W), Pelvin (PEL, 33º36’21”S 70º54’33”W), and Isla de Maipo (IM, 33º44’58”S 70º53’26”W), and one as non-polluted: San Francisco de Mostazal (SFM, 33º58’19.97”S 70º42’56.49”W) (Vega-Retter et al., 2014, 2015, 2018, 2024; Cortés-Miranda et al., 2024a) (Fig. 1).

FIGURE 1| Map of the sampling sites in the Maipo River basin. Polluted sites pointed with a red rhombus and the reference site with a yellow rhombus. The urban area is pointed in grey. SFM = San Francisco de Mostazal, MEL = Melipilla, PEL = Pelvin, IM = Isla de Maipo. Map made using QGIS v. 3.16.11.

MEL and PEL site are downstream of wastewater treatment plants that are related with Santiago city, Vega-Retter et al. (2014) have shown poor water quality in those sites regarding to chemical and physical characterization, this also has been noticed in a temporal analysis made with ten physical and chemical variables by Cortés-Miranda et al. (2024a), including: electrical conductivity (EC), pH, total dissolved solids (TDS), dissolved oxygen (DO), nitrite (NO2), ammonium (NH4+), sodium (Na+), potassium (K+), calcium (Ca2+), and magnesium (Mg2+), and it showed MEL and PEL sites associated with high total dissolved solids and nutrients, and also showing that IM site has been degraded in the last years. Additionally, Soriano et al. (2024) and Wilkinson et al. (2022) have shown the presence of organic pollutants around MEL and PEL site. In contrast the SFM site is in a less impacted area, with less population and the historical physical and chemical parameters showed in Cortés-Miranda et al. (2024a) show this site is segregated from the polluted sites. For more information about the impacts in the basin please see Tab. 1. Between 20 and 24 individuals of Basilichthys microlepidotus were sampled in each site using an electrofishing device, the fish were kept in a tank with oxygen until a small portion of the anal fin was dissected and preserved in 95% ethanol, after recuperation all fish were released.

TABLE 1 | Environmental problems affecting the Maipo River basin. MEL = Melipilla, PEL = Pelvin, IM = Isla de Maipo.

Environmental problem

Sites affected

References

Drought and agriculture activities

MEL-PEL-IM

Peña-Guerrero et al. (2020)

Pharmaceuticals

MEL-PEL

Wilkinson et al. (2022)

PAHs and other organic pollutants

MEL-PEL

Soriano et al. (2024)

Macronutrients enrichment

MEL-PEL-IM

Cortés-Miranda et al. (2024a)


Sequencing process. Diversity Arrays Technology Pty Ltd (DArT) in Canberra, Australia, extracted and sequenced genomic DNA from a small portion of each anal fin sample’s tissue. We used DArTseqTM technology to sequence all the individuals after reducing the genome’s complexity, as previously described in Vega-Retter et al. (2024). We used the raw fastq sequencing files for the subsequent steps. It consists of single reads of mean length of 70 base pairs and the quality check was performed using FastQC software (Bittencourt, 2010). The raw sequencing data is available in the Figshare repository (Tab. S1).

CNV detection. We followed the proposed pipeline by Karunarathne et al. (2023) to identify the CNV in all populations. Firstly, we used the STACKS2 v. 2.68 software (Catchen et al., 2013) to obtain the SNPs data. We then used the process_radtags command to remove the adapters from the sequences, using the following options: –r –c –q –renz_1 pstI –renz_2 sphI. Following the removal of the adapters, we executed the denovo_map.pl script with the following parameters: M 3–n 3–T 7 –min-populations 1 –min-samples-per-pop 0.60 –X “populations: –fasta-loci –vcf –genepop” –X “ustacks: -m 4” to obtain a raw vcf file with the single nucleotide polymorphism present in the data. We used the obtained raw vcf as an input for the rCNV package v. 1.2.9 (Karunarathne et al., 2023) to detect the CNV in R software v. 4.1.0 (R Development Core Team, 2021). We filtered the raw vcf, discarding samples with >50% missing data, highly related samples with an Averaged pairwise relatedness index score (Ajk) >0.9, and samples with a FIS value below -0.2 that is highly likely due to DNA contamination. We also removed the single nucleotide polymorphisms (SNPs) with >50% missing data. Next, we generated two tables: one for allele depth and another for normalised counts depth. We used the median ratio normalisation method to prevent depth variation due to sequencing heterogeneity. Following this process, we removed six samples identified as outliers: two from the SFM population, two from the PEL population, one from the MEL population, and one from the IM population. We generated all the necessary statistics to identify SNPs that deviate from the expected allelic ratio of heterozygotes or present an excess of heterozygotes according to Hardy-Weinberg (deviant SNPs) using the normalised read depth and allele depth table as follows: i) Allele ratios across all samples, ii) proportions of homo-/heterozygotes per SNP, iii) depth ratios, iv) z-score, v) chi-square significance, and vi) excess heterozygosity. There are three criteria that can help us find deviant SNPs or possible CNV: i) excess of heterozygotes; ii) heterozygous SNPs with depth values that do not follow a normal Z-score distribution; and iii) heterozygous SNPs with depth values that do not follow a chi-square distribution (Karunarathne et al., 2023). Finally, we applied the K-means algorithm, which uses the Z-score, chi-square, excess of heterozygotes, and coefficient of variation from read-depth dispersion to obtain the CNV from the deviant SNPs using the cnv function.

Population structure. We evaluated the population’s genetic structure using the Variant Fixation Index (VST) (Redon et al., 2006) and Principal Component Analysis (PCA). Both analyses utilised the normalised read depth of identified CNVs. We performed the VST calculation using the vst function of the rCNV package v. 1.2.9 (Karunarathne et al., 2023) in R software v. 4.1.0 (R Development Core Team, 2021). We conducted a permutation test with 5000 permutations using the vstPermutation function in the same package to assess the randomness of the calculated VST values. We conducted the Principal Component Analysis (PCA) using R software v. 4.1.0. We used the s.class function from the adegenet package v. 2.1.10 (Jombart, 2008) for this purpose.

Outlier detection, annotation and CNV association with environmental variables. We used the population structure analysis data to do a permutation t-test to find CNV loci that had statistically significant differences in their normalised read depth between the reference population and the polluted population. We used the R software v. 4.1.0 (R Development Core Team, 2021) for this purpose, implementing the perm.t.test function from the MKinfer package v. 1.1 (Kohl, Kohl, 2020). To avoid error type I, we adjust the p-value with the qvalue function implemented in the qvalue package v. 2.26.0 (Storey et al., 2023) with a cut-off of < 0.05. We used the local blast function of the NCBI blast software (Camacho et al., 2009) to blast all the detected outlier loci against the de novo transcriptome assemblies of liver and gill available for this species (Cortés-Miranda et al., 2024b) to identify potential genes with CNVs related to pollution. We then subjected the GO terms of the annotated loci to a semantic reduction process using REVIGO (Supek et al., 2011) to summarise the biological process associated with them. Furthermore, we performed a redundancy analysis (RDA), following the steps described in Forester et al. (2018), to find associations between the normalised read depth of 216 CNVs, which did not present missing values in all individuals of any population, and environmental variables. We used the vegan package’s rda function v. 2.6.4 (Oksanen et al., 2013) to achieve this association. We utilized all CNV data from all populations and utilised the historical data from (Cortés-Miranda et al., 2024a) for the environmental variables. These variables included a time series of ten historical (2007–2016) chemical and physical variables for all the sites: electroconductivity (EC), pH, total dissolved solids (TDS), dissolved oxygen (DO), nitrite (NO2), ammonium (NH4+), sodium (Na), potassium (K), calcium (Ca), and magnesium (Mg), all variables present lower values in the control site SFM as shown in (Cortés-Miranda et al., 2024a). We used 999 permutations and a p-value threshold of 0.05 to identify significant axes. The candidate loci were considered to present at least ± 2.2 SD from the mean loading of the significant axes. Then, we grouped the loci related to the highest correlation with the environmental variables.

Results​


CNV detection. Following the de novo pipeline in STACKS2, a raw vcf was obtained, containing information about 94 individuals and 20,931 loci. All samples passed the filters of missing data, relatedness, and high heterozygosity. We obtained a total of 18,779 loci after removing loci with missing data. We discarded six samples in the read depth normalisation step: two from the SFM population, two from the PEL population, one from the MEL population, and one from the IM population, resulting in 88 individuals. We used these individuals and loci to identify putative CNV loci. We detected a total of 584 deviants and after K-means algorithm classification only 216 were considered as putative CNV for further analysis.

Population structure. The pairwise VST values ranged between 0.002 and 0.0276, and after the permutation test, all values were statistically significant except the MEL-SFM comparison (Tab. 2). Following the PCA analysis using the normalised read depth, PC1 and PC2 accounted for 10.3% of the total variance (PC1, 5.9%; PC2, 4.4%), indicating a weak population structure with CNV for the silverside in the Maipo River basin sites, mainly showing differentiation between SFM and PEL (Fig. 2).

TABLE 2 | Pairwise VST. *Denotes significant p-values (p < 0.001) after 5000 permutations, for each pair of comparisons. SFM = San Francisco de Mostazal, PEL = Pelvin, MEL = Melipilla, IM = Isla de Maipo.


PEL

IM

MEL

SFM

0.0276*

0.0229*

0.002

PEL


0.0206*

0.0219*

IM



0.0221*


FIGURE 2| PCA using normalized read counts for 216 CNV loci in all populations. SFM = San Francisco de Mostazal, MEL = Melipilla, PEL = Pelvin, IM = Isla de Maipo.

Outlier detection, annotation and CNV association with environmental variables. We compared loci to find ones that were different between PEL-SFM and IM-SFM, since CNV cannot tell the difference between MEL-SFM. The permutation t-test, showed that were 21 and 6 outlier loci in the PEL-SFM and IM-SFM comparisons, respectively. Both comparisons identified two of these outlier loci (Fig. 3). The PEL-SFM comparison identified five of the outlier loci as transcribed genes, while the IM-SFM comparison found two of them. The five loci that matched in the PEL-SFM comparison are linked to 14 GO terms that describe three biological processes (Fig. 4): lipid transport, methylation, and the biosynthetic process of RNA-templated DNA. Two of the outliers from the PEL-SFM comparison also appeared in the IM-SFM comparison. However, only one of them matched the transcriptome that was identified as Vitellogenin A (Tab. 3). The RDA analysis found an adjusted R2 of 0.017, which is statistically significant (p < 0.01). This means that the constrained axis explains 1.7% of the variance. The first RDA axis (RDA1) accounted for 29.41% of the total variance, and the second RDA axis (RDA2) accounted for 17.13% of the variance. Only the first axis showed statistical significance (p < 0.01). The population plot shows PEL and IM segregated from SFM, while MEL merged with SFM (Fig. 5A). Out of 216 CNV loci we looked at, 11 were linked to three historical environmental variables: five were linked to EC, two to pH, and four to TDS (Fig. 5B). The pairwise comparison method in PEL-SFM found seven loci that were shared, and the IM-SFM comparison found two loci that were shared. The two loci that were shared by the PEL-SFM and IM-SFM pairwise comparisons were also found in the RDA analysis’s 11 loci. Three of the eleven loci matched in the blast analysis with a transcribed gene. One of those was the same locus matching Vitellogenin A that was an outlier in the permutation test comparing the control population SFM separately to PEL and IM. The other two Complement C3-like (C3-like), and G2/mitotic-specific cyclin-B2-like (CCNB2-like) came up specifically in the PEL-SFM permutation tests (Tab. 3). The CNV at Vitellogenin A negatively correlates with TDS, the CNV at Complement C3-like negatively correlates with EC, and the CNV at G2/mitotic-specific cyclin-B2-like positively correlates with pH. Finally, the normalized read counts of Vitellogenin A (Fig. 6A) and Complement C3-like (Fig. 6B) reveal that IM and PEL have lower read counts than the SFM population, while G2/mitotic-specific cyclin-B2-like (Fig. 6C) shows the opposite pattern.

TABLE 3 | Annotated outliers for each comparison. SFM = San Francisco de Mostazal, PEL = Pelvin, IM = Isla de Maipo.

Comparison

Loci

Gene

PEL-SFM

780:67

Complement C3-like

4129:13

Adenylate cyclase type 2-like

29079:63

Vitellogenin A

34378:5

G2/mitotic-specific cyclin-B2-like

102889:15

Protein IWS1 homolog

IM-SFM

8701:13

Annexin A2-like

29079:63

Vitellogenin A


FIGURE 3| Venn diagram of outliers detected by permutation t-test in PEL-SFM and IM-SFM comparisons and the name of the transcribed gene that match with the loci. PEL = Pelvin, IM = Isla de Maipo. vtgA: Vitellogenin A, CCNB2-like: G2/mitotic-specific cyclin-B2-like, C3-like: Complement C3-like, ADCY2-like: Adenylate cyclase type 2-like, IWS1-like: Protein IWS1 homolog, ANXA2P2-like: Annexin A2-like.

FIGURE 4| Biological processes present in outliers at PEL-SFM comparison after the permutation t-test.

FIGURE 5| RDA plot of 216 CNV and 9 historical environmental variables, showing individuals and populations distribution (A), and CNV correlated with environmental variables (B) (in color). SFM = San Francisco de Mostazal, MEL = Melipilla, PEL = Pelvin, IM = Isla de Maipo. The name of the transcribed genes that match with a CNV loci are showed in black with an arrow pointing the loci.

FIGURE 6| Violin plot of normalized read counts of Vitellogenin A (A), Complement C3-like (B), and G2/mitotic-specific cyclin-B2-like (C) in the reference population (pink) and the three polluted populations (orange). SFM = San Francisco de Mostazal, MEL = Melipilla, PEL = Pelvin, IM = Isla de Maipo. The asterisk represents statistical differences in the permutation t-test (qvalue < 0.05).

Discussion​


We found that CNV was linked to pollution in populations of B. microlepidotus that were constantly exposed to it in the Maipo River basin. We found a weak genetic structure in the population by looking at CNV markers. This supports what we already knew from population structure using other types of genetic markers (Vega-Retter et al., 2014, 2024). Also, three loci showed differences in normalized read depth between polluted and unpolluted populations. Transcribed genes accounted for all these differences. The three outlier loci annotated corresponded to Vitellogenin A, Complement C3-like, and G2/mitotic-specific cyclin-B2-like; the first two showed a lower number of normalised read counts, and the third showed a higher number of normalised read counts in IM and PEL populations when compared with the reference site (SFM). In this study, we found a lower number of CNV markers compared to SNP markers in the same dataset (Vega-Retter et al., 2024). The small portion of the genome scanned and the different structural evolution patterns for both markers may explain this difference, but even this small number of markers could significantly influence B. microlepidotus populations exposed to pollutants and their adaptation based on the number of base pairs involved (Mérot et al., 2020). Notwithstanding these constraints, we identified a link between pollution and a response at the CNV level. More research may be done on this type of structural variation in non-model species if SNPs markers are looked into. Recent studies that look at the whole genome level may help us understand the structure of the markers found in this study and how they work.

Population structure. Additional genetic markers, such as microsatellites (Vega-Retter et al., 2014) or SNPs (Vega-Retter et al., 2024), also exhibited a limited population structure, similar to what we discovered with CNV markers. Nevertheless, the discovered structure pertaining to the copy number variations is notably feeble and lacks the ability to differentiate between SFM and MEL populations. The genetic dynamics of this marker (Pös et al., 2021) and the limited number of discovered CNVs, which may skew the aforementioned tiny section of the genome being analysed, could possibly be factors influencing this result. In this scenario of weak population structure, it is unlikely that genetic drift is the main force related to the CNV outlier loci found. A similar case can be found in populations of Fundulus heteroclitus (Linnaeus, 1766) where the sensitive populations and the pollution resistant populations are closely related, making it a perfect scenario to find adaptative loci (Oziolor et al., 2019). We successfully identified outliers associated with pollution-related functions, particularly in the PEL population, and possibly due to the population structure found, could be related with adaptative processes. This group presented CNVs with statistical differences in the normalised read depth with the SFM control group and those CNVs were associated with biological processes such as lipid transport. Previous study (Cortés-Miranda et al., 2024b) revealed that these biological functions were present in genes with differential expression patterns. Moreover, the involvement of the methylation process implies a possible association between CNVs and epigenetics, considering the well-documented connection between them (Shi et al., 2020).

CNV association with environmental variables. Some CNV loci showed a correlation with three historical physical variables: five with EC, two with pH, and four with TDS, which account for 5% of all CNV loci detected. A study of Anopheles gambiae Giles, 1902, found that only 0.3% of the CNVs were under positive selection in genes related to pesticide exposure (Lucas et al., 2019). In our study, we only annotated three of these eleven loci, which represent 1.3% of all the CNV loci found.

The two methods detected outlier loci related to a variety of processes, including immunity and cell cycle. Particularly, PEL and IM populations share Vitellogenin A, a major precursor of egg yolk that plays multiple roles in immunity and antioxidant response (Hara et al., 2016). Vitellogenin genes are essential for embryo development and for survival of the first larval stages in fish. In particular, Vitellogenin A is part of the vitellogenins of type I, and in zebrafish it has been observed that it is responsible for the main content of amino acids in the egg yolk; additionally, if it is silenced or deleted, the larvae survival rate decreases (Yilmaz et al., 2024). High vitellogenin gene expression has been found in male flatfish living in the Gulf of Mexico, which has been linked to pollution (Cañizares-Martínez et al., 2024) and used as a biomarker of endocrine disruptor exposure in fish. To our knowledge, there is no other study that found CNV at the population level in this gene. However, there are different paralogues for vitellogenin generated by whole genome duplication or gene duplication in teleost genomes (Biscotti et al., 2018), so those kinds of events may happen at the population level too. It was found that this gene had fewer reads in the exposed populations (PEL-IM), but other RNA-seq studies of the same populations did not find any differences in gene expression (Vega-Retter et al., 2018). CNVs may not affect gene expression, but they can affect other genome alterations, such as changing chromatin interaction domains (Pös et al., 2021). Because a gene like Vitellogenin A is so important for development, this CNV might only affect a small part of the gene and cause a deletion in the populations that are exposed. However, we need to look at the whole gene sequence to be sure.

The Complement C3-like gene belongs to the complement system, which is part of the innate immune system. It is known that pollutants can affect both the immune system and the complement system (Bavia et al., 2022). Xu et al. (2015) and Zhou et al. (2011) have observed the downregulation of the Complement C3 gene in the presence of organic pollutants. In a previous study Cortés-Miranda et al. (2024b), found that biological processes in the PEL population enriched in differentially expressed genes related to organic pollutants and the immune system, potentially contributing to this CNV. According to research, teleosts have a lot of genetic diversity in immune-related genes at the CNV level (Reid et al., 2017; Mohamadnejad Sangdehi et al., 2024), which could help them adapt to stress conditions like chronic pollution.

As Wu et al. (2021) showed that the G2/mitotic-specific cyclin-B2 gene helps cells divide during the cell cycle, especially in the G2/M phase. Pollutants, such as nickel exposure (Guo et al., 2021), molybdenum, and cadmium (Pu et al., 2023), are known to affect the cell cycle, including the G2/M checkpoint. Researchers have linked copy number variations in cyclins to cancer (Lockwood et al., 2011), but there have been no population-level studies on fish. In general, Vega-Retter et al. (2024) observed the same pattern, with SNP loci showing signals of selection due to pollution in the same populations. Testing is necessary to determine whether these markers influence gene expression.

The study’s main goal was to find copy number variations (CNVs) in populations of B. microlepidotus that have been polluted for a long time in the Maipo River basin. We were able to find relevant CNV markers linked to pollution, even though we didn’t have a reference genome for this non-model species and only had a small part of it to work with. The markers identified a subtle but noteworthy population structure, with 11 markers that may be associated with past environmental factors. Our findings show that there may be a connection between the CNVs. We discovered that pollution-induced selection impacts changes in gene expression and SNP candidates. However, this connection has been observed only in specific genes. Due to their impact on a significant portion of the genome and their association with other biological processes such as gene expression and epigenetics, these markers have the potential to aid organisms in adapting to environmental contamination. In this study we have taken the first steps in the inclusion of structural variants like CNV in the study of non-model organisms in pollution context research using SNPs data from genome reduced representation techniques, and we hope it encourages the research in this important area in more non-model species.

Acknowledgments​


CVR was supported by Universidad de Chile Proyecto ENLACE FONDECYT (ENL16/20). CR was supported by Fondos intramurales del Consejo Superior de Investigaciones Científicas. JCM was supported by a doctoral fellowship by the Agencia Nacional de Investigación y Desarrollo number 21200769 and doctoral thesis fellowship number 242220080.

References​


Bittencourt AS. FastQC: A quality control tool for high throughput sequence data. Babraham Bioinformatics. 2010.

Bahamondes I, Soto D, Vila I. Hábitos alimentarios de los pejerreyes (Pisces, Atherinidae) del lago Rapel, Chile. Medio Ambiente. 1979; 4(1):3–18.

Bavia L, Santiesteban-Lores LE, Carneiro MC, Prodocimo MM. Advances in the complement system of a teleost fish, Oreochromis niloticus. Fish Shellfish Immunol. 2022; 123: 61–74. https://doi.org/10.1016/j.fsi.2022.02.013

Bazzicalupo AL, Ruytinx J, Ke YH, Coninx L, Colpaert JV, Nguyen NH, Vilgalys R, Branco S. Incipient local adaptation in a fungus: evolution of heavy metal tolerance through allelic and copy-number variation. BioRxiv. 2019; 832089. https://doi.org/10.1101/832089

Bélanger-Deschênes S, Couture P, Campbell PGC, Bernatchez L. Evolutionary change driven by metal exposure as revealed by coding SNP genome scan in wild yellow perch (Perca flavescens). Ecotoxicology. 2013; 22:938–57. https://doi.org/10.1007/s10646-013-1083-8

Biscotti MA, Barucca M, Carducci F, Canapa A. New perspectives on the evolutionary history of vitellogenin gene family in vertebrates. Genome Biol Evol. 2018; 10(10): 2709–15. https://doi.org/10.1093/gbe/evy206

Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. BLAST+: architecture and applications. BMC Bioinformatics. 2009; 10:421. https://doi.org/10.1186/1471-2105-10-421

Cañizares-Martínez MA, Quintanilla-Mena MA, Árcega-Cabrera F, Ceja-Moreno V, Del Río-García M, Reyes-Solian SG, Rivas-Reyes I, Puch-Hau CA. Transcriptional response of vitellogenin gene in flatfish to environmental pollutants from two regions of the Gulf of Mexico. Bull Environ Contam Toxicol. 2024; 112(1):11. https://doi.org/10.1007/s00128-023-03825-2

Catchen J, Hohenlohe PA, Bassham S, Amores A, Cresko WA. Stacks: an analysis tool set for population genomics. Mol Ecol. 2013; 22(11):3124–40. https://doi.org/10.1111/mec.12354

Chain FJJ, Feulner PGD, Panchal M, Eizaguirre C, Samonte IE, Kalbe M, Lenz TL, Stoll M, Bornberg-Bauer E, Milinski M, Reusch TBH. Extensive copy-number variation of young genes across stickleback populations. PLoS Genet. 2014; 10(12):e1004830. https://doi.org/10.1371/journal.pgen.1004830

Censo de población y vivienda (CENSO). Región Metropolitana de Santiago. [internet]. 2017. Available from: http://resultados.censo2017.cl/Region?R=R13

Comte S, Vila I. Spawning of Basilichthys microlepidotus (Jenyns). J Fish Biol. 1992; 41(6):971–81. https://doi.org/10.1111/j.1095-8649.1992.tb02724.x

Cortés-Miranda J, Rojas-Hernández N, Muñoz G, Copaja S, Quezada-Romegialli C, Veliz D, Vega-Retter C. Biomarker selection depends on gene function and organ: the case of the cytochrome P450 family genes in freshwater fish exposed to chronic pollution. PeerJ. 2024a; 12:e16925. https://doi.org/10.7717/peerj.16925

Cortés-Miranda J, Veliz D, Rojas-Hernández N, Rico C, Gutiérrez C, Vega-Retter C. Chemical-defensome and whole-transcriptome expression of the silverside fish Basilichthys microlepidotus in response to chronic pollution in the Maipo River basin, Central Chile. Aquat Toxicol. 2024b; 277:107159. https://doi.org/10.1016/j.aquatox.2024.107159

Dorant Y, Cayuela H, Wellband K, Laporte M, Rougemont Q, Mérot C, Normandeau E, Rochette R, Bernatchez L. Copy number variants outperform SNPs to reveal genotype–temperature association in a marine species. Mol Ecol. 2020; 29(24):4765–82. https://doi.org/10.1111/mec.15565

Duarte W, Feito R, Jara C, Moreno C, Orellana A. Ictiofauna del sistema hidrográfico del río Maipo. Bol Mus Nac Hist Nat. 1971; 32:227–68.

Forester BR, Lasky JR, Wagner HH, Urban DL. Comparing methods for detecting multilocus adaptation with multivariate genotype-environment associations. Mol Ecol. 2018; 27(9):2215–33. https://doi.org/10.1111/mec.14584

Guo H, Deng H, Liu H, Jian Z, Cui H, Fang J, Zuo Z, Deng J, Li Y, Wang X. Nickel carcinogenesis mechanism: cell cycle dysregulation. Environ Sci Pollut Res. 2021; 28:4893–901. https://doi.org/10.1007/s11356-020-11764-2

Hara A, Hiramatsu N, Fujita T. Vitellogenesis and choriogenesis in fishes. Fisheries Sci. 2016; 82:187–202. https://doi.org/10.1007/s12562-015-0957-5

Jombart T. “adegenet: a R package for the multivariate analysis of genetic markers.” Bioinformatics. 2008; 24(11):1403–05. https://doi.org/10.1093/bioinformatics/btn129

Karunarathne P, Zhou Q, Schliep K, Milesi PA. Comprehensive framework for detecting copy number variants from single nucleotide polymorphism data:‘rCNV’, a versatile r package for paralogue and CNV detection. Mol Ecol Resour. 2023; 23(8):1772–89. https://doi.org/10.1111/1755-0998.13843

Kohl M, Kohl MM. MKinfer: inferential statistics. R package version 1.1. 2020. Available from: https://github.com/stamats/MKinfer

Lockwood WW, Stack D, Morris T, Grehan D, O’Keane C, Stewart GL, Cumiskey J, Lam WL, Squire JA, Thomas DM. Cyclin E1 is amplified and overexpressed in osteosarcoma. J Mol Diagn. 2011; 13(3):289–96. https://doi.org/10.1016/j.jmoldx.2010.11.020

Lucas ER, Miles A, Harding NJ, Clarkson CS, Lawniczak MKN, Kwiatkowski DP, Weetman D, Donnelly MJ. Whole-genome sequencing reveals high complexity of copy number variation at insecticide resistance loci in malaria mosquitoes. Genome Res. 2019; 29(8):1250–61. https://doi.org/10.1101/gr.245795.118

Mérot C, Oomen RA, Tigano A, Wellenreuther M. A roadmap for understanding the evolutionary significance of structural genomic variation. Trends Ecol Evol. 2020; 35(7):561–72. https://doi.org/10.1016/j.tree.2020.03.002

Mohamadnejad Sangdehi F, Jamsandekar MS, Enbody ED, Pettersson ME, Andersson L. Copy number variation and elevated genetic diversity at immune trait loci in Atlantic and Pacific herring. BMC Genom. 2024; 25(1):459. https://doi.org/10.1186/s12864-024-10380-5

Mushtaq N, Singh DV, Bhat RA, Dervash MA, Hameed Ob. Freshwater contamination: sources and hazards to aquatic biota. In: Qadri H, Bhat RA, Mehmood MA, Dar GH, editors. Fresh water pollution dynamics and remediation. Singapore: Springer Singapore; 2020. p.27–50.

Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’hara R, Simpson GL, Solymos P, Stevens MHH, Wagner H. Package ‘vegan’. Community ecology package. 2013; 1–295.

Orozco LD, Cokus SJ, Ghazalpour A, Ingram-Drake L, Wang S, van Nas A, Che N, Araujo JA, Pellegrini M, Lusis AJ. Copy number variation influences gene expression and metabolic traits in mice. Hum Mol Genet. 2009; 18(21):4118–29. https://doi.org/10.1093/hmg/ddp360

Oziolor EM, Reid NM, Yair S, Lee KM, Guberman VerPloeg S, Bruns PC, Shaw JR, Whitehead A, Matson CW. Adaptive introgression enables evolutionary rescue from extreme environmental pollution. Science. 2019; 364(6439):455–57. https://doi.org/10.1126/science.aav4155

Park D, Propper CR, Wang G, Salanga MC. Synonymous single nucleotide polymorphism in arsenic (+ 3) methyltransferase of the Western mosquitofish (Gambusia affinis) and its gene expression among field populations. Ecotoxicol. 2021; 30(4):711–18. https://doi.org/10.1007/s10646-021-02376-8

Peña-Guerrero MD, Nauditt A, Muñoz-Robles C, Ribbe L, Meza F. Drought impacts on water quality and potential implications for agricultural production in the Maipo River Basin, Central Chile. Hydrolog Sci J. 2020; 65(6):1005–21. https://doi.org/10.1080/02626667.2020.1711911

Pös O, Radvanszky J, Buglyó G, Pös Z, Rusnakova D, Nagy B, Szemes T. DNA copy number variation: main characteristics, evolutionary significance, and pathological aspects. Biomed J. 2021; 44(5):548–59. https://doi.org/10.1016/j.bj.2021.02.003

Pu W, Chu X, Guo H, Huang G, Cui T, Huang B, Dai X, Zhang C. The activated ATM/AMPK/mTOR axis promotes autophagy in response to oxidative stress-mediated DNA damage co-induced by molybdenum and cadmium in duck testes. Environ Pollut. 2023; 316:120574. https://doi.org/10.1016/j.envpol.2022.120574

R Development Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2024. Available from: https://www.r-project.org/

Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W. Global variation in copy number in the human genome. Nature. 2006; 444(7118):444–54. https://doi.org/10.1038/nature05329

Reid NM, Jackson CE, Gilbert D, Minx P, Montague MJ, Hampton TH, Helfrich LW, King BL, Nacci DE, Aluru N. The landscape of extreme genomic variation in the highly adaptable Atlantic killifish. Genome Biol Evol. 2017; 9(3):659–76. https://doi.org/10.1093/gbe/evx023

Reitzel AM, Karchner SI, Franks DG, Evans BR, Nacci D, Champlin D, Vieira VM, Hahn ME. Genetic variation at aryl hydrocarbon receptor (AHR) loci in populations of Atlantic killifish (Fundulus heteroclitus) inhabiting polluted and reference habitats. BMC Evol Biol, 2014; 14:1–18. https://doi.org/10.1186/1471-2148-14-6

Shi X, Radhakrishnan S, Wen J, Chen JY, Chen J, Lam BA, Mills RE, Stranger BE, Lee C, Setlur SR. Association of CNVs with methylation variation. NPJ Genom Med. 2020; 5(1):41. https://doi.org/10.1038/s41525-020-00145-w

Sigmund G, Ågerstrand M, Antonelli A, Backhaus T, Brodin T, Diamond ML, Erdelen WR, Evers DC, Hofmann T, Hueffer T. Addressing chemical pollution in biodiversity research. Glob Change Biol. 2023; 29(12):3240–55. https://doi.org/10.1111/gcb.16689

Silva FA, Feldberg E, Carvalho NDM, Rangel SMH, Schneider CH, Carvalho-Zilse GA, Fonsêca V, Gross MC. Effects of environmental pollution on the rDNAomics of Amazonian fish. Environ Pollut. 2019; 252:180–87. https://doi.org/10.1016/j.envpol.2019.05.112

Storey J, Bass A, Dabney A, Robinson D. ‘qvalue’: Q-value estimation for false discovery rate control. Available from: https://github.com/StoreyLab/qvalue

Soriano Y, Carmona E, Renovell J, Picó Y, Brack W, Krauss M, Backhaus T, Inostroza P. A. Co-occurrence and spatial distribution of organic micropollutants in surface waters of the River Aconcagua and Maipo basins in Central Chile. Sci Total Environ. 2024; 954:176314. https://doi.org/10.1016/j.scitotenv.2024.176314

Supek F, Bošnjak M, Škunca N, Šmuc T. REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS ONE. 2011; 6(7):e21800. https://doi.org/10.1371/journal.pone.0021800

Ukaogo PO, Ewuzie U, Onwuka CV. Environmental pollution: causes, effects, and the remedies. In: Chowdhary P, Raj A, Verma D, Akhter Y, editors. Microorganisms for sustainable environment and health. Elsevier; 2020. p.419–29.

Vega-Retter C, Munoz-Rojas P, Vila I, Copaja S, Véliz D. Genetic effects of living in a highly polluted environment: the case of the silverside Basilichthys microlepidotus (Jenyns) (Teleostei: Atherinopsidae) in the Maipo River basin, central Chile. Popul Ecol. 2014; 56:569–79. https://doi.org/10.1007/s10144-014-0444-3

Vega-Retter C, Rojas-Hernández N, Cortés-Miranda J, Véliz D, Rico C. Genome scans reveal signals of selection associated with pollution in fish populations of Basilichthys microlepidotus, an endemic species of Chile. Sci Rep. 2024; 14(1):15727. https://doi.org/10.1038/s41598-024-66121-x

Vega-Retter C, Rojas-Hernandez N, Vila I, Espejo R, Loyola D, Copaja S, Briones M, Nolte A, Véliz D. Differential gene expression revealed with RNA-Seq and parallel genotype selection of the ornithine decarboxylase gene in fish inhabiting polluted areas. Sci Rep. 2018; 8(1):4820. https://doi.org/10.1038/s41598-018-23182-z

Vega-Retter C, Vila I, Véliz D. Signatures of directional and balancing selection in the silverside Basilichthys microlepidotus (Teleostei: Atherinopsidae) inhabiting a polluted river. Evol Biol. 2015; 42:156–68. https://doi.org/10.1007/s11692-015-9307-x

Véliz Baeza D, Catalan L, Pardo R, Acuna P, Díaz Lorca A, Poulin E, Vila Pinto I. The genus Basilichthys (Teleostei: Atherinopsidae) revisited along its Chilean distribution range (21 to 40 S) using variation in morphology and mtDNA. Rev Chil Hist Nat. 2012; 85(1):49–59. http://dx.doi.org/10.4067/S0716-078X2012000100004

Veliz D, Rojas-Hernández N, Copaja SV, Vega-Retter C. Temporal changes in gene expression and genotype frequency of the ornithine decarboxylase gene in native silverside Basilichthys microlepidotus: Impact of wastewater reduction due to implementation of public policies. Evol Appl. 2020; 13(6):1183–94. https://doi.org/10.1111/eva.13000

Wilkinson JL, Boxall AB, Kolpin DW, Leung KM, Lai RW, Galbán-Malagón C, Teta C. Pharmaceutical pollution of the world’s rivers. PNAS. 2022; 119(8):e2113947119. https://doi.org/10.1073/pnas.2113947119

Wu S, Su R, Jia H. Cyclin B2 (CCNB2) stimulates the proliferation of triple-negative breast cancer (TNBC) cells in vitro and in vivo. Dis Markers. 2021; 2021(1):5511041. https://doi.org/10.1155/2021/5511041

Xu T, Zhao J, Yin D, Zhao Q, Dong B. High-throughput RNA sequencing reveals the effects of 2, 2′, 4, 4′-tetrabromodiphenyl ether on retina and bone development of zebrafish larvae. BMC Genom. 2015; 16:1–12. https://doi.org/10.1186/s12864-014-1194-5

Yilmaz O, Sullivan CV, Bobe J, Norberg B. The role of multiple vitellogenins in early development of fishes. Gen Comp Endocrinol. 2024; 351:114479. https://doi.org/10.1016/j.ygcen.2024.114479

Zhou J, Lemos B, Dopman EB, Hartl DL. Copy-number variation: the balance between gene dosage and expression in Drosophila melanogaster. Genom Biol Evol. 2011; 3:1014–24. https://doi.org/10.1093/gbe/evr023

Authors


Jorge Cortés-Miranda1, David Veliz1, Ciro Rico2 and Caren Vega-Retter1

[1]    Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Las Palmeras 3425, 7800003 Ñuñoa, Santiago, Chile. (CVR) carenvega@uchile.cl (corresponding author), (DV) dveliz@uchile.cl, (JCM) jorge.cortes.m@ug.uchile.cl.

[2]    Instituto de Ciencias Marinas de Andalucía (ICMAN), Consejo Superior de Investigaciones Científicas (CSIC), Campus Universitario Río San Pedro, C. Republica Saharaui, 4, 11519 Puerto Real, Cádiz, Spain. (CR) ciro.rico@csic.es.

Authors’ Contribution


Jorge Cortés-Miranda: Conceptualization, Formal analysis, Funding acquisition, Methodology, Visualization, Writing-original draft.

David Veliz: Conceptualization, Resources, Supervision, Writing-original draft.

Ciro Rico: Conceptualization, Funding acquisition, Writing-original draft, Writing-review and editing.

Caren Vega-Retter: Conceptualization, Funding acquisition, Project administration, Resources, Supervision, Writing-original draft.

Ethical Statement​


All the fish sampled were manipulated according to the approved protocols by the Ethics Committee of the Universidad de Chile and complied with the Chilean laws (Resolución Exenta number 3078 Subsecretaria de Pesca).

Competing Interests


The author declares no competing interests.

How to cite this article


Cortés-Miranda J, Veliz D, Rico C, Vega-Retter C. Copy number variations in response to chronic pollution: Basilichthys microlepidotus in central Chile. Neotrop Ichthyol. 2025; 23(1):e240083. https://doi.org/10.1590/1982-0224-2024-0083


This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Distributed under

Creative Commons CC-BY 4.0

© 2025 The Authors.

Diversity and Distributions Published by SBI

Accepted January 24, 2025 by Franco Teixeira de Mello

Submitted August 23, 2024

Epub March 24. 2025