A Short Introduction to Algebraic Taxonomy on Genes Regions

19th March 202219th March 2022 by Robersy

a-short-introduction-to-algebraic-taxonomy-on-genes-regions

A Short Introduction to Algebraic Taxonomy on Genes Regions¶

Herein, we show a short and simply introduction to algebraic taxonomy of genomic/genes regions based on the nalysis of DNA mutational events on DNA Multiple Sequence Alignment (MSA) by means of automorphisms between pairwise DNA sequences algebraically represented as Abelian finite group.

Overview¶

GenomAutomorphism is a R package to compute the autimorphisms between pairwise aligned DNA sequences represented as elements from a Genomic Abelian group as described in reference (1). In a general scenario, whole chromosomes or genomic regions from a population (from any species or close related species) can be algebraically represented as a direct sum of cyclic groups or more specifically Abelian p-groups. Basically, we propose the representation of multiple sequence alignments (MSA) of length N as a finite Abelian group created by the direct sum of Abelian group of prime-power order:

$\qquad G = (\mathbb{Z}_{p^{\alpha_{1}}_1})^{n_1} \oplus (\mathbb{Z}_{p^{\alpha_{2}}_1})^{n_2} \oplus \dots \oplus (\mathbb{Z}_{p^{\alpha_{k}}_k})^{n_k}$

Where, the $p_i$’s are prime numbers, $\alpha_i \in \mathbb{N}$ and $\mathbb{Z}_{p^{\alpha_{i}}_i}$ is the group of integer modulo $p^{\alpha_{i}}_i$.

For the purpose of estimating the automorphism between two aligned DNA sequences, $p^{\alpha_{i}}_i \in \{5, 2^6, 5^3 \}$.

Automorphisms¶

Herein, automorphisms are considered algebraic descriptions of mutational event observed in codon sequences represented on different Abelian groups. In particular, as described in references (3-4), for each representation of the codon set on a defined Abelian group there are 24 possible isomorphic Abelian groups. These Abelian groups can be labeled based on the DNA base-order used to generate them. The set of 24 Abelian groups can be described as a group isomorphic to the symmetric group of degree four ($S_4$, see reference (4)).

For further support about the symmetric group on the 24 Abelian group of genetic-code cubes, users can also see Symmetric Group of the Genetic-Code Cubes., specifically the Mathematica notebook IntroductionToZ5GeneticCodeVectorSpace.nb and interact with it using Wolfram Player, freely available (for Windows and Linux OS) at, https://www.wolfram.com/player/. Tutorials on how to use GenomAutomorphism in the analysis of mutational events are available at its website: https://genomaths.github.io/genomautomorphism/. The source package is available in GitHub at: https://github.com/genomaths/GenomAutomorphism/.

CHAID can be installed typing in R console:

In [ ]:

install.packages(c("party", "partykit", "data.table", "ggplot2", "ggparty", "dplyr"), dependencies=TRUE)

In particular, you might require to install the library CHAID can be installed typing in R console:

In [ ]:

install.packages("CHAID", repos="http://R-Forge.R-project.org")

You can install GenomAutomorphism package from GitHub

In [ ]:

devtools::install_git("https://github.com/genomaths/GenomAutomorphism.git")

If all the required libraries all installed, then we proceed to load the libraries

In [ ]:

library(GenomAutomorphism)
library(Biostrings)
library(party)
library(partykit)
library(data.table)
library(ggplot2)
library(ggparty)
library(dplyr)
library(CHAID)

Next, we proceed to check the DNA multiple sequence alignment (MSA) file. This is a FASTA file carrying the MSA of primate BRCA1 DNA repair gene. Notice that we are familiar with the FASTA file, then it is better to directly read it with function automorphism. However, for the current example, this step can be bypassed, since the MSA is provided provided together with GenomAutomorphism R package

In [ ]:

## Do not run it. This is included with package
URL <- paste0("https://github.com/genomaths/seqalignments/raw/master/BRCA1/",
              "brca1_primates_dna_repair_20_sequences.fasta")
brca1_aln <- readDNAMultipleAlignment(filepath = URL)

Load MSA available in the package

In [4]:

data("brca1_aln", package = "GenomAutomorphism")
brca1_aln

DNAMultipleAlignment with 20 rows and 2283 columns
      aln                                                   names               
 [1] ATGGATTTATCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA NM_007298.3:20-22...
 [2] ATGGATTTATCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA U64805.1:1-2280_H...
 [3] ATGGATTTATCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_031011560.1:23...
 [4] ATGGATTTATCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_031011561.1:23...
 [5] ATGGATTTATCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_031011562.1:16...
 [6] ATGGATTTATCTGCTCTTCGCGTTG...TCAGATCCCCCACAGCCACTACTGA XM_009432101.3:27...
 [7] ATGGATTTATCTGCTCTTCGCGTTG...TCAGATCCCCCACAGCCACTACTGA XM_009432104.3:37...
 [8] ATGGATTTATCTGCTCTTCGCGTTG...TCAGATCCCCCACAGCCACTACTGA XM_016930487.2:37...
 [9] ATGGATTTATCTGCTCTTCGCGTTG...TCAGATCCCCCACAGCCACTACTGA XM_009432099.3:37...
 ... ...
[12] ATGGATTTATCTGCTCTTCGCGTTG...TCAGATCCCCCACAGCCACTACTGA XM_034941185.1:24...
[13] ATGGATTTATCTGCTCTTCGCGTTG...TCAGATCCCCCACAGCCACTACTGA XM_034941182.1:25...
[14] ATGGATTTATCTGCTGTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_032163757.1:14...
[15] ATGGATTTATCTGCTGTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_032163756.1:14...
[16] ATGGATTTATCTGCTGTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_032163758.1:13...
[17] ATGGATTTATCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_030923119.1:18...
[18] ATGGATTTATCTGCTCTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_030923118.1:18...
[19] ATGGATTTACCTGCTGTTCGCGTTG...CCAGATCCCCCACAGCCACTACTGA XM_025363316.1:14...
[20] ATGGATTTATCTGCTGTTCGTGTTG...CCAGATCCCTCACAGCCACTACTGA XM_039475995.1:49...

The sequence names

In [5]:

strtrim(names(brca1_aln@unmasked), 100)

‘NM_007298.3:20-2299_Homo_sapiens_BRCA1_DNA_repair_associated_(BRCA1)_transcript_variant_4_mRNA’
‘U64805.1:1-2280_Homo_sapiens_Brca1-delta11b_(Brca1)_mRNA_complete_cds’
‘XM_031011560.1:233-2515_PREDICTED:_Gorilla_gorilla_gorilla_BRCA1_DNA_repair_associated_(BRCA1)_trans’
‘XM_031011561.1:233-2512_PREDICTED:_Gorilla_gorilla_gorilla_BRCA1_DNA_repair_associated_(BRCA1)_trans’
‘XM_031011562.1:163-2442_PREDICTED:_Gorilla_gorilla_gorilla_BRCA1_DNA_repair_associated_(BRCA1)_trans’
‘XM_009432101.3:276-2555_PREDICTED:_Pan_troglodytes_BRCA1_DNA_repair_associated_(BRCA1)_transcript_va’
‘XM_009432104.3:371-2650_PREDICTED:_Pan_troglodytes_BRCA1_DNA_repair_associated_(BRCA1)_transcript_va’
‘XM_016930487.2:371-2650_PREDICTED:_Pan_troglodytes_BRCA1_DNA_repair_associated_(BRCA1)_transcript_va’
‘XM_009432099.3:371-2653_PREDICTED:_Pan_troglodytes_BRCA1_DNA_repair_associated_(BRCA1)_transcript_va’
‘XM_034941183.1:254-2533_PREDICTED:_Pan_paniscus_BRCA1_DNA_repair_associated_(BRCA1)_transcript_varia’
‘XM_034941184.1:254-2533_PREDICTED:_Pan_paniscus_BRCA1_DNA_repair_associated_(BRCA1)_transcript_varia’
‘XM_034941185.1:248-2527_PREDICTED:_Pan_paniscus_BRCA1_DNA_repair_associated_(BRCA1)_transcript_varia’
‘XM_034941182.1:254-2536_PREDICTED:_Pan_paniscus_BRCA1_DNA_repair_associated_(BRCA1)_transcript_varia’
‘XM_032163757.1:145-2418_PREDICTED:_Hylobates_moloch_BRCA1_DNA_repair_associated_(BRCA1)_transcript_v’
‘XM_032163756.1:145-2421_PREDICTED:_Hylobates_moloch_BRCA1_DNA_repair_associated_(BRCA1)_transcript_v’
‘XM_032163758.1:139-2412_PREDICTED:_Hylobates_moloch_BRCA1_DNA_repair_associated_(BRCA1)_transcript_v’
‘XM_030923119.1:184-2463_PREDICTED:_Rhinopithecus_roxellana_BRCA1_DNA_repair_associated_(BRCA1)_trans’
‘XM_030923118.1:183-2465_PREDICTED:_Rhinopithecus_roxellana_BRCA1_DNA_repair_associated_(BRCA1)_trans’
‘XM_025363316.1:147-2426_PREDICTED:_Theropithecus_gelada_BRCA1_DNA_repair_associated_(BRCA1)_transcri’
‘XM_039475995.1:49-2328_PREDICTED:_Saimiri_boliviensis_boliviensis_BRCA1_DNA_repair_associated_(BRCA1’

Next, function automorphism will be applied to represent the codon sequence in the Abelian group $\mathbb{Z}_{64}$ (i.e., the set of integers remainder modulo 64). The codon coordinates are requested on the cube ACGT. Following reference (4)), cubes are labeled based on the order of DNA bases used to define the sum operation.

In Z64, automorphisms are described as functions $f(x) = k\,x \quad mod\,64$, where $k$ and $x$ are elements from the set of integers modulo 64. Below, in function automorphism three important arguments are given values: group = “Z64”, cube = c(“ACGT”, “TGCA”), and _cubealt = c(“CATG”, “GTAC”). Setting for group specifies on which group the automorphisms will be computed. These groups can be: “Z5”, “Z64”, “Z125”, and “Z5^3”.

In groups “Z64” and “Z125” not all the mutational events can be described as automorphisms from a given cube. So, a character string denoting pairs of “dual” the genetic-code cubes, as given in references (1-4)), is given as argument for cube. That is, the base pairs from the given cubes must be complementary each other. Such a cube pair are call dual cubes and, as shown in reference (4)), each pair integrates group. If automorphisms are not found in first set of dual cubes, then the algorithm search for automorphism in a alternative set of dual cubes.

In [ ]:

## Do not run it. This is included with package
nams <- c("human_1","human_2","gorilla_1","gorilla_2","gorilla_3",
        "chimpanzee_1","chimpanzee_2","chimpanzee_3","chimpanzee_4",
        "bonobos_1","bonobos_2","bonobos_3","bonobos_4","silvery_gibbon_1",
        "silvery_gibbon_1","silvery_gibbon_3","golden_monkey_1",
        "golden_monkey_2","gelada_baboon","bolivian_monkey")

brca1_autm <- automorphism(
                      seqs = brca1_aln, 
                      group = "Z64", 
                      cube = c("ACGT", "TGCA"),
                      cube_alt = c("CATG", "GTAC"),
                      nms = nams, 
                      verbose = FALSE)

Object brca1_autm is included with package and can be load typing:

In [6]:

data(brca1_autm, package = "GenomAutomorphism")
brca1_autm

AutomorphismList object of length: 190
names(190): human_1.human_2 human_1.gorilla_1 human_1.gorilla_2 ... golden_monkey_2.gelada_baboon golden_monkey_2.bolivian_monkey gelada_baboon.bolivian_monkey 
------- 
Automorphism object with 761 ranges and 6 metadata columns:
        seqnames    ranges strand |        seq1        seq2    coord1    coord2
           <Rle> <IRanges>  <Rle> | <character> <character> <numeric> <numeric>
    [1]        1         1      + |         ATG         ATG        50        50
    [2]        1         2      + |         GAT         GAT        11        11
    [3]        1         3      + |         TTA         TTA        60        60
    [4]        1         4      + |         TCT         TCT        31        31
    [5]        1         5      + |         GCT         GCT        27        27
    ...      ...       ...    ... .         ...         ...       ...       ...
  [757]        1       757      + |         CAC         CAC         5         5
  [758]        1       758      + |         AGC         AGC        33        33
  [759]        1       759      + |         CAC         CAC         5         5
  [760]        1       760      + |         TAC         TAC        13        13
  [761]        1       761      + |         TGA         TGA        44        44
             autm        cube
        <numeric> <character>
    [1]         1        ACGT
    [2]         1        ACGT
    [3]         1        ACGT
    [4]         1        ACGT
    [5]         1        ACGT
    ...       ...         ...
  [757]         1        ACGT
  [758]         1        ACGT
  [759]         1        ACGT
  [760]         1        ACGT
  [761]         1        ACGT
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
...
<189 more DFrame element(s)>
Two slots: 'DataList' & 'SeqRanges'
-------

Grouping automorphism by automorphism’s coefficients¶

Automorphisms with the same automorphism’s coefficients can be grouped. This task can be accomplished with function automorphismByCoef. However, for the sake of time, its output is included in the package.

In [ ]:

## Not need to run it here
autby_coef <- automorphismByCoef(x = brca1_autm,
                                 verbose = FALSE)

Object brca1_autm is included with package and can be load typing:

In [7]:

data(autby_coef, package = "GenomAutomorphism")
autby_coef

AutomorphismByCoefList object of length 190:
$human_1.human_2
AutomorphismByCoef object with 239 ranges and 5 metadata columns:
        seqnames    ranges strand |        seq1        seq2      autm
           <Rle> <IRanges>  <Rle> | <character> <character> <numeric>
    [1]        1     1-238      + |         ATG         ATG         1
    [2]        1     1-238      + |         GAT         GAT         1
    [3]        1     1-238      + |         TTA         TTA         1
    [4]        1     1-238      + |         TCT         TCT         1
    [5]        1     1-238      + |         GCT         GCT         1
    ...      ...       ...    ... .         ...         ...       ...
  [235]        1   511-761      + |         CCC         CCC         1
  [236]        1   511-761      + |         CTT         CTT         1
  [237]        1   511-761      + |         CCT         CCT         1
  [238]        1   511-761      + |         ATA         ATA         1
  [239]        1   511-761      + |         TGA         TGA         1
           mut_type        cube
        <character> <character>
    [1]         HHH        ACGT
    [2]         HHH        ACGT
    [3]         HHH        ACGT
    [4]         HHH        ACGT
    [5]         HHH        ACGT
    ...         ...         ...
  [235]         HHH        ACGT
  [236]         HHH        ACGT
  [237]         HHH        ACGT
  [238]         HHH        ACGT
  [239]         HHH        ACGT
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

...
<189 more elements>

In the next, we are interested on mutational events in respect to human (as reference).

In [8]:

nams <- names(brca1_autm)
idx1 <- grep("human_1.", nams)
idx2 <- grep("human_2.", nams)
idx <- union(idx1, idx2)
h_brca1_autm <- unlist(brca1_autm[ idx ])
h_brca1_autm = h_brca1_autm[ which(h_brca1_autm$autm != 1) ]
h_brca1_autm

Automorphism object with 1397 ranges and 6 metadata columns:
                          seqnames    ranges strand |        seq1        seq2
                             <Rle> <IRanges>  <Rle> | <character> <character>
          human_1.human_2        1       239      + |         CAT         CGT
          human_1.human_2        1       253      + |         GCA         GTA
          human_1.human_2        1       323      + |         TCT         CCT
          human_1.human_2        1       333      + |         TCT         TCC
          human_1.human_2        1       350      + |         ---         ---
                      ...      ...       ...    ... .         ...         ...
  human_2.bolivian_monkey        1       716      + |         AAT         AGT
  human_2.bolivian_monkey        1       726      + |         GAG         GAA
  human_2.bolivian_monkey        1       730      + |         GTG         GTA
  human_2.bolivian_monkey        1       731      + |         ACC         ACT
  human_2.bolivian_monkey        1       756      + |         CCC         CCT
                             coord1    coord2      autm        cube
                          <numeric> <numeric> <numeric> <character>
          human_1.human_2         7        39        33        ACGT
          human_1.human_2        24        56        21        ACGT
          human_1.human_2        31        23         9        ACGT
          human_1.human_2        31        29         3        ACGT
          human_1.human_2        NA        NA        -1        Gaps
                      ...       ...       ...       ...         ...
  human_2.bolivian_monkey         3        35        33        ACGT
  human_2.bolivian_monkey        10         8        52        ACGT
  human_2.bolivian_monkey        58        56        12        ACGT
  human_2.bolivian_monkey        17        19        35        ACGT
  human_2.bolivian_monkey        21        23        59        ACGT
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Bar plot automorphism distribution by coefficient¶

The automorphism distribution by cubes can be summarized in the bar-plot graphic.

Object autby_coef carried all the pairwise comparisons, while it will be enough to use data from a single species as reference, e.g., humans.

First the data must be reordered into a data.frame object:

In [9]:

h_autby_coef <- automorphismByCoef(x = h_brca1_autm)
h_autby_coef

AutomorphismByCoef object with 1395 ranges and 5 metadata columns:
                           seqnames    ranges strand |        seq1        seq2
                              <Rle> <IRanges>  <Rle> | <character> <character>
     human_1.gelada_baboon        1         4      + |         TCT         CCT
     human_2.gelada_baboon        1         4      + |         TCT         CCT
  human_1.silvery_gibbon_1        1         6      + |         CTT         GTT
  human_1.silvery_gibbon_1        1         6      + |         CTT         GTT
  human_1.silvery_gibbon_3        1         6      + |         CTT         GTT
                       ...      ...       ...    ... .         ...         ...
         human_2.bonobos_2        1       753      + |         CCC         CCT
         human_2.bonobos_3        1       753      + |         CCC         CCT
         human_2.bonobos_4        1       753      + |         CCC         CCT
   human_1.bolivian_monkey        1       756      + |         CCC         CCT
   human_2.bolivian_monkey        1       756      + |         CCC         CCT
                                autm    mut_type        cube
                           <numeric> <character> <character>
     human_1.gelada_baboon         9         YHH        ACGT
     human_2.gelada_baboon         9         YHH        ACGT
  human_1.silvery_gibbon_1        29         SHH        ACGT
  human_1.silvery_gibbon_1        29         SHH        ACGT
  human_1.silvery_gibbon_3        29         SHH        ACGT
                       ...       ...         ...         ...
         human_2.bonobos_2        59         HHY        ACGT
         human_2.bonobos_3        59         HHY        ACGT
         human_2.bonobos_4        59         HHY        ACGT
   human_1.bolivian_monkey        59         HHY        ACGT
   human_2.bolivian_monkey        59         HHY        ACGT
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

Every single base mutational event across the MSA was classified according IUPAC nomenclature: 1) According to the number of hydrogen bonds (on DNA/RNA double helix): strong S={C, G} (three hydrogen bonds) and weak W={A, U} (two hydrogen bonds). According to the chemical type: purines R={A, G} and pyrimidines Y={C, U}. 3). According to the presence of amino or keto groups on the base rings: amino M={C, A} and keto K={G, T}. Constant (hold) base positions were labeled with letter H. So, codon positions labeled as HKH means that the first and third bases remains constant and mutational events between bases G and T were found in the MSA.

In [10]:

nams <- names(h_autby_coef)
nams <- sub("human[_][1-2][.]", "", nams)
nams <- sub("[_][1-6]", "", nams)
dt <- data.frame(h_autby_coef, species = nams)
dt <- data.frame(dt, species = nams)
dt <- dt[, c("start", "autm", "species", "mut_type", "cube")]
DataFrame(dt)

DataFrame with 1395 rows and 5 columns
         start      autm         species    mut_type        cube
     <integer> <numeric>     <character> <character> <character>
1            4         9   gelada_baboon         YHH        ACGT
2            4         9   gelada_baboon         YHH        ACGT
3            6        29  silvery_gibbon         SHH        ACGT
4            6        29  silvery_gibbon         SHH        ACGT
5            6        29  silvery_gibbon         SHH        ACGT
...        ...       ...             ...         ...         ...
1391       753        59         bonobos         HHY        ACGT
1392       753        59         bonobos         HHY        ACGT
1393       753        59         bonobos         HHY        ACGT
1394       756        59 bolivian_monkey         HHY        ACGT
1395       756        59 bolivian_monkey         HHY        ACGT

Nominal variables are transformed into factor

In [11]:

dt$start <- as.numeric(dt$start)
dt$autm <- as.numeric(dt$autm)
dt$cube <- as.factor(dt$cube)
dt$species <- as.factor(dt$species)
dt$mut_type <- as.factor(dt$mut_type)

Finally the bar-plot is built typing:

In [19]:

counts <- table(dt$cube)
par(family = "serif", cex = 1, font = 2, mar=c(4,6,4,4))
barplot(counts, #main="Automorphism distribution",
        xlab="Genetic-code cube representation",
        ylab="Fixed mutational events",
        col=c("darkblue","red", "darkgreen", "magenta", "orange"), 
        border = NA, axes = F, #ylim = c(0, 6000),
        cex.lab = 2, cex.main = 1.5, cex.names = 2)
axis(2, at = c(0, 200, 400, 600, 800, 1000), cex.axis = 1.5)
mtext(side = 1,line = -3, at = c(0.7, 1.9, 3.1, 4.3),
      text = paste0( counts ), cex = 2,
      col = c("white", "red","yellow", "black"))

Classification Tree Chi-squared Automated Interaction Detection (CHAID)¶

The current CHAID implementation only accepts nominal or ordinal categorical predictors. When predictors are continuous, they have to be transformed into ordinal predictors before using the following algorithm. We create a ordinal variable autms from variable autm. The variables of interest are defined and encoded.

In [20]:

interval <- function(x, a, b) {
    x >= a & x <= b
}
datos = dt
datos$autms <- case_when(datos$autm < 16 ~ 'A1',
                  interval(datos$autm, 16, 31) ~ 'A2',
                  interval(datos$autm, 32, 47) ~ 'A3',
                  datos$autm > 47 ~ 'A4')
datos$autms <- as.factor(datos$autms)
datos$mut_type <- as.character(datos$mut_type)
datos$mut_type[ which(datos$cube == "Trnl") ] <- "indel"
datos$mut_type[ which(datos$cube == "Gaps") ] <- "---"
datos$mut_type <- as.factor(datos$mut_type)
datos$regions <- case_when(datos$start < 230 ~ 'R0',
                  interval(datos$start, 230, 270) ~ 'R1',
                  interval(datos$start, 271, 305) ~ 'R2',
                  interval(datos$start, 306, 338) ~ 'R3',
                  interval(datos$start, 339, 533) ~ 'R4',
                  interval(datos$start, 534, 570) ~ 'R5',
                  interval(datos$start, 571, 653) ~ 'R6',
                  interval(datos$start, 654, 709) ~ 'R7',
                  datos$start > 709 ~ 'R8')
datos$regions <- as.factor(datos$regions)
datos$autm <- as.factor(datos$autm)
datos$species <- as.factor(datos$species)
datos$start <- as.factor(datos$start)
datos$cube <- as.factor(datos$cube)
datos <- datos[, c( "autms", "regions", "mut_type", "cube", "species")]
DataFrame(datos)

DataFrame with 1395 rows and 5 columns
        autms  regions mut_type     cube         species
     <factor> <factor> <factor> <factor>        <factor>
1          A1       R0      YHH     ACGT  gelada_baboon 
2          A1       R0      YHH     ACGT  gelada_baboon 
3          A2       R0      SHH     ACGT  silvery_gibbon
4          A2       R0      SHH     ACGT  silvery_gibbon
5          A2       R0      SHH     ACGT  silvery_gibbon
...       ...      ...      ...      ...             ...
1391       A4       R8      HHY     ACGT bonobos        
1392       A4       R8      HHY     ACGT bonobos        
1393       A4       R8      HHY     ACGT bonobos        
1394       A4       R8      HHY     ACGT bolivian_monkey
1395       A4       R8      HHY     ACGT bolivian_monkey

A classification tree is estimated with CHAID algorithm:

In [42]:

ctrl <- chaid_control(minsplit = 200, minprob = 0.8, alpha2 = 0.01, alpha4 = 0.01)
chaid_res <- chaid(species ~ autms + regions + mut_type + cube , data = datos,
                   control = ctrl)

Plotting the CHAID tree¶

Next, the data must be prepared for plotting the tree with ggparty:

In [43]:

##  Updating CHAID decision tree
dp <- data_party(chaid_res)
dat <- dp[, c("autms", "regions", "mut_type", "cube")]
dat$species <- dp[, "(response)"]
    
    
chaid_tree <- party(node = node_party(chaid_res), 
                    data = dat,
                    fitted =  dp[, c("(fitted)", "(response)")], 
                    names = names(chaid_res))
## Extract p-values
pvals <- unlist(nodeapply(chaid_tree, ids = nodeids(chaid_tree), function(n) {
    pvals <- info_node(n)$adjpvals
    pvals < pvals[ which.min(pvals) ]
    return(pvals)
}))
pvals <- pvals[ pvals < 0.05 ]
## Counts of event per spciees on each node
node.freq <- sapply(seq_along(chaid_tree), function(id) {
    y <- data_party(chaid_tree, id = id)
    y <- y[[ "(response)" ]]
    table(y)
})
## total counts on each
node.size =  colSums(node.freq)

Plotting the tree with ggparty (font size adjusted for html output)

In [48]:

options(repr.plot.width = 24, repr.plot.height = 20)


ggparty(chaid_tree) +
    geom_edge(aes(color = id, size = node.size[id]/300), show.legend = FALSE) +
    geom_edge_label(size = 6, colour = "red",
                    fontface = "bold", 
                    shift = 0.64, 
                    nudge_x = -0.01,
                    max_length = 10,
                    splitlevels = 1:4) +
    geom_node_label(line_list = list(aes(label = paste0("Node ", id, ": ", splitvar)),
                aes(label = paste0("N=", node.size[id], ", p", 
                                 ifelse(pvals < .001, "<.001",
                                        paste0("=", round(pvals, 3)))), 
                    size = 16)),
                    line_gpar = list(list(size = 16), 
                                     list(size = 16)),
                ids = "inner", fontface = "bold", size = 16) +
    geom_node_info() +
    geom_node_label(aes(label = paste0("N = ", node.size), 
                        fontface = "bold"),
                    ids = "terminal", nudge_y = 0.01, nudge_x = 0.01, size = 6) +
    geom_node_plot(gglist = list(
        geom_bar(aes(x = "", fill = species), size = 0.2, width = 0.9,
                 position = position_fill(), color = "black"),
        theme_minimal(base_family = "arial", base_size = 24),
        scale_fill_manual(values = c("gray50","gray55","gray60",
                                     "gray70","gray80","gray85",
                                     "blue","gray95")),
        xlab(""), 
        ylab("Probability"), 
        geom_text(aes(x = "", group = species, 
                      label = stat(count)),
                  stat = "count", position = position_fill(), 
                  vjust = 1., size = 6)),
        shared_axis_labels = TRUE, size = 1.2)

Stochastic-deterministic logical rules¶

Since only one mutational event human-to-human in region R1 from class A3 is reported in the right side of the tree, with high probability only non-humans hold the following rule:

In [45]:

rule <- (dat$autms == "A4" | (dat$autms == "A3" & dat$mut_type != "HRH"))
unique(as.character(dat[rule,]$species))

‘bolivian_monkey’
‘golden_monkey’
‘gelada_baboon’
‘silvery_gibbon’
‘chimpanzee’
‘bonobos’
‘gorilla’

Only humans-to-human mutations hold the following rule:

In [46]:

idx <- dat$autm == "A1" & dat$regions == "R3" & (dat$mut_type == "HHY" | dat$mut_type == "YHH")
dat[ idx, ]

A data.frame: 2 × 5
	autms	regions	mut_type	cube	species
	<fct>	<fct>	<fct>	<fct>	<fct>
632	A1	R3	YHH	ACGT	human
687	A1	R3	HHY	ACGT	human

Only non-humans hold the following rule

In [47]:

rule <- (dat$autms == "A4" | (dat$autms == "A3" & dat$regions != "R1"))
unique(as.character(dat[rule,]$species))

‘bolivian_monkey’
‘golden_monkey’
‘gelada_baboon’
‘silvery_gibbon’
‘gorilla’
‘chimpanzee’
‘bonobos’

References¶

1. Sanchez R, Morgado E, Grau R. Gene algebra from a genetic code algebraic structure. J Math Biol. 2005 Oct;51(4):431-57. doi: 10.1007/s00285-005-0332-8. Epub 2005 Jul 13. PMID: 16012800. ( PDF).

2. Robersy Sanchez, Jesús Barreto (2021) Genomic Abelian Finite Groups. doi: 10.1101/2021.06.01.446543.

3. M. V José, E.R. Morgado, R. Sánchez, T. Govezensky, The 24 possible algebraic representations of the standard genetic code in six or in three dimensions, Adv. Stud. Biol. 4 (2012) 119–152.PDF.

4. R. Sanchez. Symmetric Group of the Genetic–Code Cubes. Effect of the Genetic–Code Architecture on the Evolutionary Process MATCH Commun. Math. Comput. Chem. 79 (2018) 527-560. PDF.

The Binary Alphabet of DNA

11th March 201912th March 2019 by Robersy

On the DNA Computer Binary Code

In any finite set we can define a partial order, a binary operation in different ways. But here, a partial order is defined in the set of four DNA bases in such a manner that a Boolean lattice structure is obtained. A Boolean lattice is an algebraic structure that captures essential properties of both set operations and logic operations. This partial order is defined based on the physico-chemical properties of the DNA bases: hydrogen bond number and chemical type: of purine {A, G} and pyrimidine {U, C}. This physico-mathematical description permits the study of the genetic information carried by the DNA molecules as a computer binary code of zeros (0) and (1).

1. Boolean lattice of the four DNA bases

In any four-element Boolean lattice every element is comparable to every other, except two of them that are, nevertheless, complementary. Consequently, to build a four-base Boolean lattice it is necessary for the bases with the same number of hydrogen bonds in the DNA molecule and in different chemical types to be complementary elements in the lattice. In other words, the complementary bases in the DNA molecule (G≡C and A=T or A=U during the translation of mRNA) should be complementary elements in the Boolean lattice. Thus, there are four possible lattices, each one with a different base as the maximum element.

2. Boolean (logic) operations in the set of DNA bases

The Boolean algebra on the set of elements X will be denoted by $(B(X), \vee, \wedge)$. Here the operators $\vee$ and $\wedge$ represent classical “OR” and “AND” logical operations term-by-term. From the Boolean algebra definition it follows that this structure is (among other things) a partially ordered set in which any two elements $\alpha$ and $\beta$ have upper and lower bounds. Particularly, the greater lower bound of the elements $\alpha$ and $\beta$ is the element $\alpha\vee\beta$ and the least upper bound is the element $\alpha\wedge\beta$. This equivalent partial ordered set is called Boolean lattice.

In every Boolean algebra (denoted by $(B(X), \vee, \wedge)$) for any two elements , $\alpha,\beta \in X$ we have $\alpha \le \beta$, if and only if $\neg\alpha\vee\beta=1$, where symbol “$\neg$” stands for the logic negation. If the last equality holds, then it is said that $\beta$ is deduced from $\alpha$. Furthermore, if $\alpha \le \beta$ or $\alpha \ge \beta$ the elements and are said to be comparable. Otherwise, they are said not to be comparable.

In the set of four DNA bases, we can built twenty four isomorphic Boolean lattices [1]. Herein, we focus our attention that one described in reference [2], where the DNA bases G and C are taken as the maximum and minimum elements, respectively, in the Boolean lattice. The logic operation in this DNA computer code are given in the following table:

OR					AND
$\vee$	G	A	U	C	$\wedge$	G	A	U	C
G	G	A	U	Ç	G	G	G	G	G
A	A	A	C	C	A	G	A	G	A
U	U	C	U	C	U	G	G	U	U
C	C	C	C	C	C	G	A	U	C

It is well known that all Boolean algebras with the same number of elements are isomorphic. Therefore, our algebra $(B(X), \vee, \wedge)$ is isomorphic to the Boolean algebra $(\mathbb{Z}_2^2(X), \vee, \wedge)$, where $\mathbb{Z}_2 = \{0,1\}$. Then, we can represent this DNA Boolean algebra by means of the correspondence: $G \leftrightarrow 00$; $A \leftrightarrow 01$; $U \leftrightarrow 10$; $C \leftrightarrow 11$. So, in accordance with the operation table:

$A \vee U = C \leftrightarrow 01 \vee 10 = 11$
$U \wedge G = U \leftrightarrow 10 \wedge 00 = 00$
$G \vee C = C \leftrightarrow 00 \vee 11 = 11$

The logic negation ($\neg$) of a base yields the DNA complementary base: $\neg A = U \leftrightarrow \neg 01 = 10$; $\neg G = C \leftrightarrow \neg 00 = 11$

A Boolean lattice has in correspondence a directed graph called Hasse diagram, where two nodes (elements) $\alpha$ and $\beta$ are connected with a directed edge from $\alpha$ to $\beta$ (or connected with a directed edge from $\beta$ to $\alpha$) if, and only if, $\alpha \le \beta$ ($\alpha \ge \beta$) and there is no other element between $\alpha$ and $\beta$.

The figure shows the Hasse diagram corresponding to the Boolean algebra $(B(X), \vee, \wedge)$. There are twenty four possible Hasse diagrams of four DNA bases and they integrate a symmetric group isomorphic to the symmetric group of degree four $S_4$ [1].

3. The Genetic code Boolean Algebras

Boolean algebras of codons are, explicitly, derived as the direct product $C(X) = B(X) \times B(X) \times B(X)$. These algebras are isomorphic to the dual Boolean algebras $(\mathbb{Z}_2^6, \vee, \wedge)$ and $(\mathbb{Z}_2^6, \wedge, \vee)$ induced by the isomorphism $B(X) \cong \mathbb{Z}_2^2$, where $X$ runs over the twenty four possibles ordered sets of four DNA bases [1]. For example:

CAG $\vee$ AUC = CCC $\leftrightarrow$ 110100 $\vee$ 011011 = 111111

ACG $\wedge$ UGA = GGG $\leftrightarrow$ 011100 $\wedge$ 100001 = 000000

$\neg$ (CAU) = GUA $\leftrightarrow$ $\neg$ (110110) = 001001

The Hasse diagram for the corresponding Boolean algebra derived from the direct product of the Boolean algebra of four DNA bases given in the above operation table is:

In the Hasse diagram, chains and anti-chains are located. A Boolean lattice subset is called a chain if any two of its elements are comparable but, on the contrary, if any two of its elements are not comparable, the subset is called an anti-chain. In the Hasse diagram of codons shown in the figure, all chains with maximal length have the same minimum element GGG and the maximum element CCC. It is evident that two codons are in the same chain with maximal length if and only if they are comparable, for example the chain: GGG $\leftrightarrow$ GAG $\leftrightarrow$ AAG $\leftrightarrow$ AAA $\leftrightarrow$ AAC $\leftrightarrow$ CAC $\leftrightarrow$ CCC

The Hasse diagram symmetry reflects the role of hydrophobicity in the distribution of codons assigned to each amino acid. In general, codons that code to amino acids with extreme hydrophobic differences are in different chains with maximal length. In particular, codons with U as a second base will appear in chains of maximal length whereas codons with A as a second base will not. For that reason, it will be impossible to obtain hydrophobic amino acid with codons having U in the second position through deductions from hydrophilic amino acids with codons having A in the second position.

There are twenty four Hasse diagrams of codons, corresponding to the twenty four genetic-code Boolean algebras. These algebras integrate a symmetric group isomorphic to the symmetric group of degree four $S_4$ [1]. In summary, the DNA binary code is not arbitrary, but subject to logic operations with subjacent biophysical meaning.

References

Sanchez R. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process. MATCH Commun Math Comput Chem, 2018, 79:527–60.
Sánchez R, Morgado E, Grau R. A genetic code Boolean structure. I. The meaning of Boolean deductions. Bull Math Biol, 2005, 67:1–14.

The genetic-code vector space B^3 over the Galois field GF(5)

26th January 201926th January 2019 by Robersy

**The $\mathbb{Z_5}$-vector space $\mathfrak{B}$³ over the field $(\mathbb{Z_5}, +, .)$**

1. Background

This is a formal introduction to the genetic code $\mathbb{Z_5}$-vector space $\mathfrak{B}^3$ over the field $(\mathbb{Z_5}, +, .)$. This mathematical model is defined based on the physicochemical properties of DNA bases (see previous post). This introduction can be complemented with a Wolfram Computable Document Format (CDF) named IntroductionToZ5GeneticCodeVectorSpace.cdf available in GitHub. This is graphic user interface with an interactive didactic introduction to the mathematical biology background that is explained here. To interact with a CDF users will require for Wolfram CDF Player or Mathematica. The Wolfram CDF Player is freely available (easy installation on Windows OS and on Linux OS).

2. Biological mathematical model

If the Watson-Crick base pairings are symbolically expressed by means of the sum “+” operation, in such a way that hold: G + C = C + G = D, U + A = A + U = D, then this requirement leads us to define an additive group ($\mathfrak{B}^3$, +) on the set of five DNA bases ($\mathfrak{B}^3$, +). Explicitly, it was required that the bases with the same number of hydrogen bonds in the DNA molecule and different chemical types were algebraically inverse in the additive group defined in the set of DNA bases $\mathfrak{B}$. In fact eight sum tables (like that one shown below), which will satisfice the last constraints, can be defined in eight ordered sets: {D, A, C, G, U}, {D, U, C, G, A}, {D, A, G, C, U}, {D, U, G, C, A},{G, A, U, C},{G, U, A, C},{C, A, U, G} and {C, U, A, G} [1,2]. The sets originated by these base orders are called the strong-weak ordered sets of bases [1,2] since, for each one of them, the algebraic-complementary bases are DNA complementary bases as well, pairing with three hydrogen bonds (strong, G:::C) and two hydrogen bonds (weak, A::U). We shall denote this set SW.

A set of extended base triplet is defined as $\mathfrak{B}^3$ = {XYZ | X, Y, Z $\in\mathfrak{B}$}, where to keep the biological usual notation for codons, the triplet of letters $XYZ\in\mathfrak{B}^3$ denotes the vector $(X,Y,Z)\in\mathfrak{B}^3$ and $\mathfrak{B} =$ {A, C, G, U}. An Abelian group on the extended triplets set can be defined as the direct third power of group:

$(\mathfrak{B}^3,+) = (\mathfrak{B},+)×(\mathfrak{B},+)×(\mathfrak{B},+)$

where X, Y, Z $\in\mathfrak{B}$, and the operation “+” as shown in the table [2]. Next, for all elements $\alpha\in\mathbb{Z}_{(+)}$ (the set of positive integers) and for all codons $XYZ\in(\mathfrak{B}^3,+)$, the element:

$\alpha \bullet XYZ = \overbrace{XYZ+XYX+…+XYZ}^{\hbox{$\alpha$ times}}\in(\mathfrak{B}^3,+)$ is well defined. In particular, $0 \bullet X =$ D for all $X\in(\mathfrak{B}^3,+) $. As a result, $(\mathfrak{B}^3,+)$ is a three-dimensional (3D) $\mathbb{Z_5}$-vector space over the field $(\mathbb{Z_5}, +, .)$ of the integer numbers modulo 5, which is isomorphic to the Galois field GF(5). Notice that the Abelian groups $(\mathbb{Z}_5, +)$ and $(\mathfrak{B},+)$ are isomorphic. For the sake of brevity, the same notation $\mathfrak{B}^3$ will be used to denote the group $(\mathfrak{B}^3,+)$ and the vector space defined on it.

+	D	A	C	G	U
D	D	A	C	G	U
A	A	C	G	U	D
C	C	G	U	D	A
G	G	U	D	A	C
U	U	D	A	C	G

This operation is only one of the eight sum operations that can be defined on each one of the ordered sets of bases from SW.

3. The canonical base of the $\mathbb{Z_5}$-vector space $\mathfrak{B}^3$

Next, in the vector space $\mathfrak{B}^3$, vectors (extended codons): e₁=ADD, e₂= DAD and e₃=DDA are linearly independent, i.e., $\sum\limits_{i=1}^3 c_i e_i =$ DDD implies $c_1=0, c_2=0$ and $c_3=0$ for any distinct $c_1, c_2, c_3 \in\mathbb{Z_5}$. Moreover, the representation of every extended triplet $XYZ\in\mathfrak{B}^3$ on the field $\mathbb{Z_5}$ as $XYZ=xe_1+ye_2+ze_3$ is unique and the generating set $e_1, e_2$, and $e_3$ is a canonical base for the $\mathbb{Z_5}$-vector space $\mathfrak{B}^3$. It is said that elements $x, y, z \in\mathbb{Z_5}$ are the coordinates of the extended triplet $XYZ\in\mathfrak{B}^3$ in the canonical base ($e_1, e_2, e_3$) [3]

José M V, Morgado ER, Sánchez R, Govezensky T. The 24 Possible Algebraic Representations of the Standard Genetic Code in Six or in Three Dimensions. Adv Stud Biol, 2012, 4:119–52.
Sanchez R. Symmetric Group of the Genetic-Code Cubes. Effect of the Genetic-Code Architecture on the Evolutionary Process. MATCH Commun Math Comput Chem, 2018, 79:527–60.
Sánchez R, Grau R. An algebraic hypothesis about the primeval genetic code architecture. Math Biosci, 2009, 221:60–76.

Group operations on the set of five DNA bases

19th January 20193rd February 2019 by Robersy

General biochemical background. The genetic information on how to build proteins able to perform different biological/biochemical functions is encoded in the DNA sequence. Code-words of three letters/bases, called triplets or codons, are used to encode the information that will be used to synthesize proteins. Every codon encodes the information for one amino acid and every amino acid can be encoded by one or more codons. The genetic code is the biochemical system that establishes the rules by which the nucleotide sequence of a protein-encoding gene is transcribed into mRNA codon sequences and then translated into the amino acid sequences of the corresponding proteins. The genetic code is an extension of the four-letter alphabet found in DNA molecules. These “letters” are the DNA bases: Adenine, Guanine, Cytosine, and Thymine, usually denoted A, G, C, and T respectively (in an RNA molecule, T is changed to U, uracil).

In the DNA molecules, nucleotide bases are paired according to the following rule (Watson-Crick base pairings): G:C, A:T. That is, base G is the complementary base of C, and A is the complementary base of T (or U) in the DNA (or in the RNA) molecule and vice-versa.

Each DNA/RNA base can be classified into three main classes according to three criteria: chemical type (purines: A and G, or pyrimidines: C and T), number of hydrogen bonds (strong or weak) and the whether the base has an amino group or a keto group [1]. Each criterion produces a partition of the set of bases [2].

Two hydrogen bonds are established in the pairing of Adenine (T) and Thymine (T) two. Yikrazuul [Public domain], from Wikimedia Commons

The pairing between guanine and cytosine involved three hydrogen bonds. Yikrazuul [Public domain], from Wikimedia Commons

The relationships between the DNA nucleotide bases, quantitatively expressed throughout the Watson-Crick base-pairings, permit the representation of the standard genetic code as a cube inserted in the Euclidean three-dimensional vector space $\mathbb{R}^3$ [3,4]. In particular, it is plausible that the present standard genetic code was derived from an ancestral code architecture with five or more bases (see main text for full discussion). The algebraic and biological model suggests the plausibility of the transition from a primeval code with an extended DNA alphabet $\mathfrak{B}$ ={D,A,C,G,U} to the present standard code, where the symbol “D” represents one or more hypothetical bases with unspecific pairings. It is important to observe that though the evidence from organic chemistry experiments supports the necessity of five or more DNA bases in the primordial genetic system apparatus, the formal development of the algebraic theory necessarily leads to an extension of the DNA base alphabet. In fact, the additional base was implicit in the multiple sequence alignments of the DNA sequences as gaps representing insertion and deletion mutations (indel mutations). The importance of considering the indel mutations in the phylogenetic analysis was analyzed in [2]. Perhaps the most significant role of the fifth base in the current DNA molecules is played by the epigenetics role of cytosine DNA methylation (CDM). CDM patterning represents one feature of the epigenome that is highly responsive to environmental stress and associates with trans-generational adaptation in plants and in animals.

If the Watson-Crick base pairings are symbolically expressed by means of the sum “+” operation in such a way that the following relationships hold: G + C = C + G = D, and A + U = U + A = D, then this requirement leads to the definition of an additive group or Abelian group on the set of five RNA (DNA) bases. Explicitly, it will be required that bases with the same number of hydrogen bonds in the DNA molecule and different chemical types be algebraic inverses of each other in the additive group defined on the set of DNA bases. In other words, the complementary RNA bases G:C and A:U (or G:C and A:T in the DNA) are, respectively, algebraic complements. This definition also reflects the non-specific pairings of the ancient hypothetical base(s) D, which is taken as the neutral element of the sum operation. Next, there is only one possible definition for the multiplication operation ($\times$) (with base A as the neutral element for this operation) in such a way that it completes a finite (Galois) field structure isomorphic to the field $\mathbb{Z_5}$ and defined over the set of integers modulo 5 (GF(5)). The simplicity of these operations can be noticed in Table 1.

Table 1. Operation tables of the Galois field (GF(5)) on the ordered set of the extended bases alphabet $\mathfrak{B}$={D, A, C, G, U}, and on $\mathbb{Z_5}$.

Sum

Product

Sum						Product
+	0	1	2	3	4	×	0	1	2	3	4
0	0	1	2	3	4	0	0	0	0	0	0
1	1	2	3	4	0	1	0	1	2	3	4
2	2	3	4	0	1	2	0	2	4	1	3
3	3	4	0	1	2	3	0	3	1	4	2
4	4	0	1	2	3	4	0	4	3	2	1

Further readings are provided in the Computable Document Format (CDF) named IntroductionToZ5GeneticCodeVectorSpace.cdf. In this CDF, readers will gain a better comprehension of the biological and algebraic background on the genetic code algebras presented in reference [4]. This is a didactic and interactive visualization to introduce the Genetic code $\mathbb{Z}_5$-Vector Space.

Cornish-Bowden, A. (1985). Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Research, 13(9), 3021–3030.
Jimenez-Montano MA, de la Mora-Basanez CR, Poschel T. The hypercube structure of the genetic code explains conservative and non-conservative aminoacid substitutions in vivo and in vitro. Biosystems, 1996, 39:117–25
Sanchez R, Grau R, Morgado E (2006) A novel Lie algebra of the genetic code over the Galois field of four DNA bases. Math Biosci 202: 156-174.
Sánchez R, Grau R (2009) An algebraic hypothesis about the primeval genetic code architecture. Math Biosci 221 : 60 – 76.

Genomic Mathematics

The fabric of life

Category: genome_algebras

A Short Introduction to Algebraic Taxonomy on Genes Regions

A Short Introduction to Algebraic Taxonomy on Genes Regions¶

Overview¶

Automorphisms¶

CHAID can be installed typing in R console:

Grouping automorphism by automorphism’s coefficients¶

Bar plot automorphism distribution by coefficient¶

Classification Tree Chi-squared Automated Interaction Detection (CHAID)¶

Plotting the CHAID tree¶

Stochastic-deterministic logical rules¶

References¶

The Binary Alphabet of DNA

On the DNA Computer Binary Code

1. Boolean lattice of the four DNA bases

2. Boolean (logic) operations in the set of DNA bases

3. The Genetic code Boolean Algebras

References

The genetic-code vector space B^3 over the Galois field GF(5)

**The $\mathbb{Z_5}$-vector space $\mathfrak{B}$³ over the field $(\mathbb{Z_5}, +, .)$**

1. Background

2. Biological mathematical model

3. The canonical base of the $\mathbb{Z_5}$-vector space $\mathfrak{B}^3$

Group operations on the set of five DNA bases

A Short Introduction to Algebraic Taxonomy on Genes Regions¶

Overview¶

Automorphisms¶

CHAID can be installed typing in R console:

Grouping automorphism by automorphism’s coefficients¶

Bar plot automorphism distribution by coefficient¶

Classification Tree Chi-squared Automated Interaction Detection (CHAID)¶

Plotting the CHAID tree¶

Stochastic-deterministic logical rules¶

References¶

On the DNA Computer Binary Code

1. Boolean lattice of the four DNA bases

2. Boolean (logic) operations in the set of DNA bases

3. The Genetic code Boolean Algebras

References

The $\mathbb{Z_5}$-vector space $\mathfrak{B}$3 over the field $(\mathbb{Z_5}, +, .)$

1. Background

2. Biological mathematical model

3. The canonical base of the $\mathbb{Z_5}$-vector space $\mathfrak{B}^3$

**The $\mathbb{Z_5}$-vector space $\mathfrak{B}$³ over the field $(\mathbb{Z_5}, +, .)$**