Sequence types and features ontology (SO)

A structured controlled vocabulary for sequence annotation, for the exchange of annotation data and for the description of sequence objects in databases.

Open in the Ontology Lookup Service (OLS)


<new term> [SO_0002117]

[]

2KB_downstream_variant [SO_0002083]

A sequence variant located within 2KB 3’ of a gene.

2KB_upstream_variant [SO_0001636]

A sequence variant located within 2KB 5’ of a gene.

3_prime_UTR_elongation [SO_0002016]

A sequence variant that causes the extension of 3’ UTR, with regard to the reference sequence.

3_prime_UTR_exon_variant [SO_0002089]

A UTR variant of exonic sequence of the 3’ UTR. Requested by visze github tracker ID 346.

3_prime_UTR_intron_variant [SO_0002090]

A UTR variant of intronic sequence of the 3’ UTR. Requested by visze github tracker ID 346.

3_prime_UTR_truncation [SO_0002015]

A sequence variant that causes the reduction of a the 3’ UTR with regard to the reference sequence.

3_prime_UTR_variant [SO_0001624]

A UTR variant of the 3’ UTR. EBI term 3prime UTR variations - In 3prime UTR.

3D_polypeptide_structure_variant [SO_0001599]

A sequence variant that changes the resulting polypeptide structure.

4_methylcytosine [SO_0001919]

A cytosine methylated at the 4 nitrogen.

5_carboxylcytosine [SO_0001966]

A modified DNA cytosine base feature, modified by a carboxy group at the 5 carbon.

5_formylcytosine [SO_0001961]

A modified DNA cytosine base feature, modified by a formyl group at the 5 carbon.

5_hydroxymethylcytosine [SO_0001960]

A modified DNA cytosine base feature, modified by a hydroxymethyl group at the 5 carbon.

5_methylcytosine [SO_0001918]

A cytosine methylated at the 5 carbon.

5_prime_UTR_elongation [SO_0002014]

A sequence variant that causes the extension of 5’ UTR, with regard to the reference sequence.

5_prime_UTR_exon_variant [SO_0002092]

A UTR variant of exonic sequence of the 5’ UTR. Requested by visze github tracker ID 346.

5_prime_UTR_intron_variant [SO_0002091]

A UTR variant of intronic sequence of the 5’ UTR. Requested by visze github tracker ID 346.

5_prime_UTR_premature_start_codon_gain_variant [SO_0001988]

A 5’ UTR variant where a premature start codon is gained.

5_prime_UTR_premature_start_codon_loss_variant [SO_0001989]

A 5’ UTR variant where a premature start codon is lost.

5_prime_UTR_premature_start_codon_variant [SO_0001983]

A 5’ UTR variant where a premature start codon is introduced, moved or lost. Requested by Andy Menzies at the Sanger. This isn’t necessarily a protein coding change. A premature start codon can effect the production of a mature protein product by providing a competing translation start point. Some genes balance their expression this way, eg THPO requires the presence of a premature start to limit expression, its loss leads to Familial thrombocythemia.

5_prime_UTR_truncation [SO_0002013]

A sequence variant that causes the reduction of a the 5’UTR with regard to the reference sequence.

5_prime_UTR_variant [SO_0001623]

A UTR variant of the 5’ UTR. EBI term: 5prime UTR variations - In 5prime UTR (untranslated region).

500B_downstream_variant [SO_0001634]

A sequence variant located within a half KB of the end of a gene.

5KB_downstream_variant [SO_0001633]

A sequence variant located within 5 KB of the end of a gene. EBI term Downstream variations - Within 5 kb downstream of the 3prime end of a transcript.

5KB_upstream_variant [SO_0001635]

A sequence variant located within 5KB 5’ of a gene. EBI term Upstream variations - Within 5 kb upstream of the 5prime end of a transcript.

8_oxoadenine [SO_0001967]

A modified DNA adenine base,at the 8 carbon, often the product of DNA damage.

8_oxoguanine [SO_0001965]

A modified DNA guanine base,at the 8 carbon, often the product of DNA damage.

A_box_type_1 [SO_0001675]

An A box within an RNA polymerase III type 1 promoter. The A box can be found in the promoters of type 1 and type 2 (pol III) so sub-typing here allows the part of relationship of the subtypes to remain true.

A_box_type_2 [SO_0001676]

An A box within an RNA polymerase III type 2 promoter. The A box can be found in the promoters of type 1 and type 2 (pol III) so sub-typing here allows the part of relationship of the subtypes to remain true.

A_minor_RNA_motif [SO_0000022]

A region forming a motif, composed of adenines, where the minor groove edges are inserted into the minor groove of another helix.

A_to_C_transversion [SO_1000024]

A transversion from adenine to cytidine.

A_to_G_transition [SO_1000015]

A transition of an adenine to a guanine.

A_to_T_transversion [SO_1000025]

A transversion from adenine to thymine.

AACCCT_box [SO_0001901]

A conserved 17-bp sequence (5’-ATCA(C/A)AACCCTAACCCT-3’) commonly present upstream of the start site of histone transcription units functioning as a transcription factor binding site.

aberrant_processed_transcript [SO_0000681]

A transcript that has been processed “incorrectly”, for example by the failure of splicing of one or more exons.

accessible_DNA_region [SO_0002331]

A region of DNA that is depleted of nucleosomes and accessible to DNA-binding proteins including transcription factors and nucleases. Added as part of GREEKC terms. See GitHub Issues #531 & #534.

Ace2_UAS [SO_0001857]

A promoter element with consensus sequence CCAGCC, bound by the fungal transcription factor Ace2.

active_peptide [SO_0001064]

Active peptides are proteins which are biologically active, released from a precursor molecule. Hormones, neuropeptides, antimicrobial peptides, are active peptides. They are typically short (<40 amino acids) in length.

adaptive_island [SO_0000775]

An adaptive island is a genomic island that provides an adaptive advantage to the host. The iron-uptake ability of many pathogens are conveyed by adaptive islands. Nature Reviews Microbiology 2, 414-424 (2004); doi:10.1038 micro 884 GENOMIC ISLANDS IN PATHOGENIC AND ENVIRONMENTAL MICROORGANISMS Ulrich Dobrindt, Bianca Hochhut, Ute Hentschel & Jorg Hacker.

alanine [SO_0001435]

A non-polar, hydorophobic amino acid encoded by the codons GCN (GCT, GCC, GCA and GCG). A place holder for a cross product with chebi.

alanine_tRNA_primary_transcript [SO_0000211]

A primary transcript encoding alanyl tRNA.

alanyl_tRNA [SO_0000254]

A tRNA sequence that has an alanine anticodon, and a 3’ alanine binding region.

allele [SO_0001023]

An allele is one of a set of coexisting sequence variants of a gene.

allelic_frequency [SO_0002119]

A physical quality which inheres to the allele by virtue of the number instances of the allele within a population. This is the relative frequency of the allele at a given locus in a population. Requested by HL7 clinical genomics group.

allelically_excluded [SO_0000137]

Allelic exclusion is a process occurring in diploid organisms, where a gene is inactivated and not expressed in that cell. Examples are x-inactivation and immunoglobulin formation.

allelically_excluded_gene [SO_0000897]

A gene that is allelically_excluded.

allopolyploid [SO_0001256]

A polyploid where the multiple chromosome set was derived from a different organism.

alpha_beta_motif [SO_0100008]

A motif of five consecutive residues and two H-bonds in which: H-bond between CO of residue(i) and NH of residue(i+4), H-bond between CO of residue(i) and NH of residue(i+3),Phi angles of residues(i+1), (i+2) and (i+3) are negative.

alteration_attribute [SO_0001508]

An attribute of alteration of one or more chromosomes.

alternate_sequence_site [SO_0001149]

Description of sequence variants produced by alternative splicing, alternative promoter usage, alternative initiation and ribosomal frameshifting. Discrete.

alternately_spliced_gene_encodeing_one_transcript [SO_0000503]

[alternately_spliced_gene_encodeing_one_transcript]

alternately_spliced_gene_encoding_greater_than_one_transcript [SO_0000543]

[alternately_spliced_gene_encoding_greater_than_one_transcript]

alternatively_spliced [SO_0000877]

An attribute describing a situation where a gene may encode for more than 1 transcript.

alternatively_spliced_gene_encoding_greater_than_1_polypeptide_coding_regions_overlapping [SO_1001194]

[alternatively_spliced_gene_encoding_greater_than_1_polypeptide_coding_regions_overlapping]

alternatively_spliced_transcript [SO_1001187]

A transcript that is alternatively spliced.

alternatively_spliced_transcript_encoding_greater_than_1_polypeptide_different_start_codon_different_stop_codon_coding_regions_non_overlapping [SO_1001244]

[alternatively_spliced_transcript_encoding_greater_than_1_polypeptide_different_start_codon_different_stop_codon_coding_regions_non_overlapping; alternatively_spliced_transcript_encoding_greater_than_1_polypeptide_different_start_codon_different_stop_codon_coding_regions_non-overlapping]

Alu_deletion [SO_0002070]

A deletion of an Alu mobile element with respect to a reference.

Alu_insertion [SO_0002063]

An insertion of sequence from the Alu family of mobile elements.

ambisense_ssRNA_viral_sequence [SO_0001202]

A ambisense_RNA_virus is a ss_RNA_viral_sequence that is the sequence of a single stranded RNA virus with both messenger and anti messenger polarity.

amino_acid_deletion [SO_0001604]

A sequence variant within a CDS resulting in the loss of an amino acid from the resulting polypeptide.

amino_acid_insertion [SO_0001605]

A sequence variant within a CDS resulting in the gain of an amino acid to the resulting polypeptide.

amino_acid_substitution [SO_0001606]

A sequence variant of a codon resulting in the substitution of one amino acid for another in the resulting polypeptide.

amplification_origin [SO_0000750]

An origin_of_replication that is used for the amplification of a chromosomal nucleic acid sequence.

anchor_binding_site [SO_0000977]

Part of an edited transcript only. [anchor_binding_site; transcript_region; anchor binding site]

anchor_region [SO_0000931]

A region of a guide_RNA that base-pairs to a target mRNA.

androgen_response_element [SO_0001853]

A non-palindromic sequence found in the promoters of genes whose expression is regulated in response to androgen.

aneuploid_chromosome [SO_0000550]

A chromosome structural variation whereby either a chromosome exists in addition to the normal chromosome complement or is lacking. Examples are Nullo-4, Haplo-4 and triplo-4 in Drosophila.

annotation_directed_improved_draft [SO_0001489]

The status of a whole genome sequence,where annotation, and verification of coding regions has occurred.

anti_ARRET [SO_0001926]

A non-coding RNA transcript, derived from the transcription of the telomere. These transcripts are antisense of ARRET transcripts. Telomeric transcription has been documented in mammals, birds, fish, plants and yeast. Requested by Antonia Lock, October 2012.

anticodon [SO_0001174]

A sequence of three nucleotide bases in tRNA which recognizes a codon in mRNA.

anticodon_loop [SO_0001173]

A sequence of seven nucleotide bases in tRNA which contains the anticodon. It has the sequence 5’-pyrimidine-purine-anticodon-modified purine-any base-3.

antiparallel_beta_strand [SO_0001112]

A peptide region which hydrogen bonded to another region of peptide running in the oposite direction (one running N-terminal to C-terminal and one running C-terminal to N-terminal). Hydrogen bonding occurs between every other C=O from one strand to every other N-H on the adjacent strand. In this case, if two atoms C-alpha (i) and C-alpha (j) are adjacent in two hydrogen-bonded beta strands, then they form two mutual backbone hydrogen bonds to each other’s flanking peptide groups; this is known as a close pair of hydrogen bonds. The peptide backbone dihedral angles (phi, psi) are about (-140 degrees, 135 degrees) in antiparallel sheets. Range.

antisense_lncRNA [SO_0001904]

Non-coding RNA transcribed from the opposite DNA strand compared with other transcripts and overlap in part with sense RNA. Relationship is_a SO:0000644 antisense_RNA added 23 April 2021. See GitHub Issue #443

antisense_primary_transcript [SO_0000645]

The reverse complement of the primary transcript.

AP_1_binding_site [SO_0001842]

A promoter element with consensus sequence TGACTCA, bound by AP-1 and related transcription factors.

apicoplast_chromosome [SO_0001259]

A chromosome originating in an apicoplast.

apicoplast_gene [SO_0000091]

A gene from apicoplast sequence.

apicoplast_sequence [SO_0000743]

DNA belonging to the genome of an apicoplast, a non-photosynthetic plastid.

aptamer [SO_0000031]

DNA or RNA molecules that have been selected from random pools based on their ability to bind other molecules.

archaeal_intron [SO_1001271]

An intron characteristic of Archaeal tRNA and rRNA genes, where intron transcript generates a bulge-helix-bulge motif that is recognised by a splicing endoribonuclease. Intron characteristic of tRNA genes; splices by an endonuclease-ligase mediated mechanism.

archaeosine [SO_0001323]

Archaeosine is a modified 7-deazoguanosine.

arginine [SO_0001451]

A positively charged, hydorophilic amino acid encoded by the codons CGN (CGT, CGC, CGA and CGG), AGA and AGG. A place holder for a cross product with chebi.

arginine_tRNA_primary_transcript [SO_0000212]

A primary transcript encoding arginyl tRNA (SO:0000255).

arginyl_tRNA [SO_0001036]

A tRNA sequence that has an arginine anticodon, and a 3’ arginine binding region.

ARIA [SO_0001925]

A non-coding RNA transcript, derived from the transcription of the telomere. These transcripts consist of C rich repeats. Telomeric transcription has been documented in mammals, birds, fish, plants and yeast. Requested by Antonia Lock, October 2012.

ARRET [SO_0001924]

A non coding RNA transcript, complementary to subtelomeric tract of TERRA transcript but devoid of the repeats. Telomeric transcription has been documented in mammals, birds, fish, plants and yeast. Requested by Antonia Lock, October 2012.

ARS [SO_0000436]

A sequence that can autonomously replicate, as a plasmid, when transformed into a bacterial host.

ARS_consensus_sequence [SO_0002004]

The ACS is an 11-bp sequence of the form 5’-WTTTAYRTTTW-3’ which is at the core of every yeast ARS, and is necessary but not sufficient for recognition and binding by the origin recognition complex (ORC). Functional ARSs require an ACS, as well as other cis elements in the 5’ (C domain) and 3’ (B domain) flanking sequences of the ACS.

asparagine [SO_0001449]

A polar, hydorophilic amino acid encoded by the codons AAT and AAC. A place holder for a cross product with chebi.

asparagine_tRNA_primary_transcript [SO_0000213]

A primary transcript encoding asparaginyl tRNA (SO:0000256).

asparaginyl_tRNA [SO_0000256]

A tRNA sequence that has an asparagine anticodon, and a 3’ asparagine binding region.

aspartic_acid [SO_0001453]

A negatively charged, hydorophilic amino acid encoded by the codons GAT and GAC. A place holder for a cross product with chebi.

aspartic_acid_tRNA_primary_transcript [SO_0000214]

A primary transcript encoding aspartyl tRNA (SO:0000257).

aspartyl_tRNA [SO_0000257]

A tRNA sequence that has an aspartic acid anticodon, and a 3’ aspartic acid binding region.

ASPE_primer [SO_0001698]

“A primer containing an SNV at the 3’ end for accurate genotyping.

assembly_error_correction [SO_0001525]

A region of sequence where the final nucleotide assignment differs from the original assembly due to an improvement that replaces a mistake.

assortment_derived_aneuploid [SO_0000058]

[assortment_derived_aneuploid; assortment-derived_aneuploid]

assortment_derived_aneuploid [SO_0000803]

A multi-chromosome aberration generated by reassortment of other aberration components; presumed to have a deficiency or a duplication.

assortment_derived_deficiency [SO_0000052]

[assortment_derived_deficiency; assortment-derived_deficiency]

assortment_derived_deficiency [SO_0000802]

A multi-chromosome deficiency aberration generated by reassortment of other aberration components.

assortment_derived_deficiency_plus_duplication [SO_0000554]

[assortment_derived_deficiency_plus_duplication]

assortment_derived_deficiency_plus_duplication [SO_0000801]

A multi-chromosome aberration generated by reassortment of other aberration components; presumed to have a deficiency and a duplication.

assortment_derived_duplication [SO_0000437]

[assortment_derived_duplication]

assortment_derived_duplication [SO_0000800]

A multi-chromosome duplication aberration generated by reassortment of other aberration components.

assortment_derived_variation [SO_0001504]

A chromosome variation derived from an event during meiosis.

asx_motif [SO_0001106]

A motif of five consecutive residues and two H-bonds in which: Residue(i) is Aspartate or Asparagine (Asx), side-chain O of residue(i) is H-bonded to the main-chain NH of residue(i+2) or (i+3), main-chain CO of residue(i) is H-bonded to the main-chain NH of residue(i+3) or (i+4).

asx_turn [SO_0000912]

A motif of three consecutive residues and one H-bond in which: residue(i) is Aspartate or Asparagine (Asx), the side-chain O of residue(i) is H-bonded to the main-chain NH of residue(i+2).

asx_turn_left_handed_type_one [SO_0001129]

Left handed type I (dihedral angles):- Residue(i): -140 degrees < chi (1) -120 degrees < -20 degrees, -90 degrees < psi +120 degrees < +40 degrees. Residue(i+1): -140 degrees < phi < -20 degrees, -90 degrees < psi < +40 degrees.

asx_turn_left_handed_type_two [SO_0001130]

Left handed type II (dihedral angles):- Residue(i): -140 degrees < chi (1) -120 degrees < -20 degrees, +80 degrees < psi +120 degrees < +180 degrees. Residue(i+1): +20 degrees < phi < +140 degrees, -40 degrees < psi < +90 degrees.

asx_turn_right_handed_type_one [SO_0001132]

Right handed type I (dihedral angles):- Residue(i): -140 degrees < chi (1) -120 degrees < -20 degrees, -90 degrees < psi +120 degrees < +40 degrees. Residue(i+1): -140 degrees < phi < -20 degrees, -90 degrees < psi < +40 degrees.

asx_turn_right_handed_type_two [SO_0001131]

Right handed type II (dihedral angles):- Residue(i): -140 degrees < chi (1) -120 degrees < -20 degrees, +80 degrees < psi +120 degrees < +180 degrees. Residue(i+1): +20 degrees < phi < +140 degrees, -40 degrees < psi < +90 degrees.

asymmetric_RNA_internal_loop [SO_0000021]

An internal RNA loop where one of the strands includes more bases than the corresponding region on the other strand.

attB_site [SO_0000943]

An integration/excision site of a bacterial chromosome at which a recombinase acts to insert foreign DNA containing a cognate integration/excision site.

attC_site [SO_0000950]

An attC site is a sequence required for the integration of a DNA of an integron.

attCtn_site [SO_0001043]

An attachment site located on a conjugative transposon and used for site-specific integration of a conjugative transposon.

attenuator [SO_0000140]

A sequence segment located within the five prime end of an mRNA that causes premature termination of translation.

attI_site [SO_0000367]

A region within an integron, adjacent to an integrase, at which site specific recombination involving an attC_site takes place.

attL_site [SO_0000944]

A region that results from recombination between attP_site and attB_site, composed of the 5’ portion of attB_site and the 3’ portion of attP_site.

attP_site [SO_0000942]

An integration/excision site of a phage chromosome at which a recombinase acts to insert the phage DNA at a cognate integration/excision site on a bacterial chromosome.

attR_site [SO_0000945]

A region that results from recombination between attP_site and attB_site, composed of the 5’ portion of attP_site and the 3’ portion of attB_site.

AUG_initiated_uORF [SO_0002150]

A uORF beginning with the canonical start codon AUG.

autocatalytically_spliced_intron [SO_0000588]

A self spliced intron.

autoregulated [SO_0000471]

The gene product is involved in its own transcriptional regulation.

autosynaptic_chromosome [SO_1000136]

An autosynaptic chromosome is the aneuploid product of recombination between a pericentric inversion and a cytologically wild-type chromosome.

B_box [SO_0000620]

A variably distant linear promoter region recognized by TFIIIC, with consensus sequence AGGTTCCAnnCC. Binds TFIIIC.

BAC [SO_0000153]

Bacterial Artificial Chromosome, a cloning vector that can be propagated as mini-chromosomes in a bacterial host. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

BAC_clone [SO_0000764]

[BAC_clone]

BAC_cloned_genomic_insert [SO_0000992]

A region of DNA that has been inserted into the bacterial genome using a bacterial artificial chromosome. Requested by Andy Schroder - Flybase Harvard, Nov 2006.

BAC_end [SO_0000999]

A region of sequence from the end of a BAC clone that may provide a highly specific marker. Requested by Keith Boroevich December, 2006.

BAC_read_contig [SO_0001866]

A contig of BAC reads. Requested by Bayer Cropscience December, 2011.

bacterial_RNApol_promoter [SO_0000613]

A DNA sequence to which bacterial RNA polymerase binds, to begin transcription. former parent RNA_polymerase_promoter SO:0001203 was merged with promoter SO:0000167 in Aug 2020 as part of GREEKC.

bacterial_RNApol_promoter_region [SO_0000843]

A region which is part of a bacterial RNA polymerase promoter. This is a manufactured term to allow the parts of bacterial_RNApol_promoter to have an is_a path back to the root.

bacterial_RNApol_promoter_sigma_70 [SO_0001671]

A DNA sequence to which bacterial RNA polymerase sigma 70 binds, to begin transcription.

bacterial_RNApol_promoter_sigma_ecf_element [SO_0001913]

A bacterial promoter with sigma ecf factor binding dependency. This is a type of bacterial promoters that requires a sigma ECF factor to bind to identified -10 and -35 sequence regions in order to mediate binding of the RNA polymerase to the promoter region as part of transcription initiation. Requested by Kevin Clancy - invitrogen -May 2012.

bacterial_RNApol_promoter_sigma54 [SO_0001672]

A DNA sequence to which bacterial RNA polymerase sigma 54 binds, to begin transcription.

bacterial_terminator [SO_0000614]

A terminator signal for bacterial transcription. Moved to transcriptional_cis_regulatory_region (SO:0001055) from gene_group_regulatory_region (SO:0000752) on 11 Feb 2021 when SO:0000752 was merged into SO:0001055. See GitHub Issue #529.

base_call_error_correction [SO_0001526]

A region of sequence where the final nucleotide assignment is different from that given by the base caller due to an improvement that replaces a mistake.

base_pair [SO_0000028]

Two bases paired opposite each other by hydrogen bonds creating a secondary structure.

benign_variant [SO_0001770]

A variant that does not affect the function of the gene or cause disease.

beta_bulge [SO_0001107]

A motif of three residues within a beta-sheet in which the main chains of two consecutive residues are H-bonded to that of the third, and in which the dihedral angles are as follows: Residue(i): -140 degrees < phi(l) -20 degrees , -90 degrees < psi(l) < 40 degrees. Residue (i+1): -180 degrees < phi < -25 degrees or +120 degrees < phi < +180 degrees, +40 degrees < psi < +180 degrees or -180 degrees < psi < -120 degrees.

beta_bulge_loop [SO_0001108]

A motif of three residues within a beta-sheet consisting of two H-bonds. Beta bulge loops often occur at the loop ends of beta-hairpins.

beta_bulge_loop_five [SO_0001109]

A motif of three residues within a beta-sheet consisting of two H-bonds in which: the main-chain NH of residue(i) is H-bonded to the main-chain CO of residue(i+4), the main-chain CO of residue i is H-bonded to the main-chain NH of residue(i+3), these loops have an RL nest at residues i+2 and i+3.

beta_bulge_loop_six [SO_0001110]

A motif of three residues within a beta-sheet consisting of two H-bonds in which: the main-chain NH of residue(i) is H-bonded to the main-chain CO of residue(i+5), the main-chain CO of residue i is H-bonded to the main-chain NH of residue(i+4), these loops have an RL nest at residues i+3 and i+4.

beta_turn [SO_0001133]

A motif of four consecutive residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth. It is characterized by the dihedral angles of the second and third residues, which are the basis for sub-categorization.

beta_turn_left_handed_type_one [SO_0001134]

Left handed type I:A motif of four consecutive residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth. It is characterized by the dihedral angles:- Residue(i+1): -140 degrees > phi > -20 degrees, -90 degrees > psi > +40 degrees. Residue(i+2): -140 degrees > phi > -20 degrees, -90 degrees > psi > +40 degrees.

beta_turn_left_handed_type_two [SO_0001135]

Left handed type II: A motif of four consecutive residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth. It is characterized by the dihedral angles: Residue(i+1): -140 degrees > phi > -20 degrees, +80 degrees > psi > +180 degrees. Residue(i+2): +20 degrees > phi > +140 degrees, -40 degrees > psi > +90 degrees.

beta_turn_right_handed_type_one [SO_0001136]

Right handed type I:A motif of four consecutive residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth. It is characterized by the dihedral angles: Residue(i+1): -140 degrees < phi < -20 degrees, -90 degrees < psi < +40 degrees. Residue(i+2): -140 degrees < phi < -20 degrees, -90 degrees < psi < +40 degrees.

beta_turn_right_handed_type_two [SO_0001137]

Right handed type II:A motif of four consecutive residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth. It is characterized by the dihedral angles: Residue(i+1): -140 degrees < phi < -20 degrees, +80 degrees < psi < +180 degrees. Residue(i+2): +20 degrees < phi < +140 degrees, -40 degrees < psi < +90 degrees.

beta_turn_type_eight [SO_0001155]

A motif of four consecutive peptide residues that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth and is characterized by the dihedral angles: Residue(i+1): phi ~ -60 degrees, psi ~ -30 degrees. Residue(i+2): phi ~ -120 degrees, psi ~ 120 degrees.

beta_turn_type_six [SO_0001150]

A motif of four consecutive peptide resides of type VIa or type VIb and where the i+2 residue is cis-proline.

beta_turn_type_six_a [SO_0001151]

A motif of four consecutive peptide residues, of which the i+2 residue is proline, and that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth and is characterized by the dihedral angles: Residue(i+1): phi ~ -60 degrees, psi ~ 120 degrees. Residue(i+2): phi ~ -90 degrees, psi ~ 0 degrees.

beta_turn_type_six_a_one [SO_0001152]

A type VIa beta turn with the following phi and psi sngles on amino acid residues 2 and 3: phi-2 = -60 degrees, psi-2 = 120 degrees, phi-3 = -90 degrees, psi-3 = 0 degrees.

beta_turn_type_six_a_two [SO_0001153]

A type VIa beta turn with the following phi and psi sngles on amino acid residues 2 and 3: phi-2 = -120 degrees, psi-2 = 120 degrees, phi-3 = -60 degrees, psi-3 = 0 degrees.

beta_turn_type_six_b [SO_0001154]

A motif of four consecutive peptide residues, of which the i+2 residue is proline, and that may contain one H-bond, which, if present, is between the main-chain CO of the first residue and the main-chain NH of the fourth and is characterized by the dihedral angles: Residue(i+1): phi ~ -120 degrees, psi ~ 120 degrees. Residue(i+2): phi ~ -60 degrees, psi ~ 0 degrees.

bidirectional_gene_fusion [SO_0002086]

A sequence variant whereby two genes, on alternate strands have become joined. Requested by SNPEFF team. Feb 2016.

bidirectional_promoter [SO_0000568]

A promoter that can allow for transcription in both directions. Definition updated in Aug 2020 by Dave Sant.

binding_site [SO_0000409]

A biological_region of sequence that, in the molecule, interacts selectively and non-covalently with other molecules. A region on the surface of a molecule that may interact with another molecule. When applied to polypeptides: Amino acids involved in binding or interactions. It can also apply to an amino acid bond which is represented by the positions of the two flanking amino acids. See GO:0005488 : binding.

biochemical_region_of_peptide [SO_0100001]

A region of a peptide that is involved in a biochemical function. Range.

biological_region [SO_0001411]

A region defined by its disposition to be involved in a biological process.

biomaterial_region [SO_0001409]

A region which is intended for use in an experiment.

bipartite_duplication [SO_1000149]

An interchromosomal mutation whereby the (large) region between the first two breaks listed is lost, and the two flanking segments (one of them centric) are joined as a translocation to the free ends resulting from the third break.

bipartite_inversion [SO_1000151]

A chromosomal inversion caused by three breaks in the same chromosome; both central segments are inverted in place (i.e., they are not transposed).

blocked_reading_frame [SO_0000718]

A reading_frame that is interrupted by one or more stop codons; usually identified through inter-genomic sequence comparisons. Term requested by Rama from SGD.

blunt_end_restriction_enzyme_cleavage_junction [SO_0001693]

A restriction enzyme cleavage site where both strands are cut at the same position.

blunt_end_restriction_enzyme_cleavage_site [SO_0001691]

A restriction enzyme recognition site that, when cleaved, results in no overhangs.

bound_by_factor [SO_0000277]

An attribute describing a sequence that is bound by another molecule. Formerly called transcript_by_bound_factor.

bound_by_nucleic_acid [SO_0000876]

An attribute describing a sequence that is bound by a nucleic acid.

bound_by_protein [SO_0000875]

An attribute describing a sequence that is bound by a protein.

boundary_element [SO_0002020]

Boundary elements are DNA motifs that prevent heterochromatin from spreading into neighboring euchromatic regions. Requested by Antonia Lock. Insulator is included as a related synonym since this is used to refer to insulator in the literature (NCBI:cf).

branch_site [SO_0000611]

A pyrimidine rich sequence near the 3’ end of an intron to which the 5’end becomes covalently bound during nuclear splicing. The resulting structure resembles a lariat.

BREd_motif [SO_0001663]

A core RNA polymerase II promoter element with consensus (G/A)T(T/G/A)(T/A)(G/T)(T/G)(T/G).

BREu_motif [SO_0000016]

A sequence element characteristic of some RNA polymerase II promoters, located immediately upstream of some TATA box elements at -37 to -32 with respect to the TSS (+1). Consensus sequence is (G|C)(G|C)(G|A)CGCC. Binds TFIIB. Binds TFIIB.

Bruno_response_element [SO_0001181]

A cis-acting element found in the 3’ UTR of some mRNA which is bound by the Drosophila Bruno protein and its homologs. Not to be confused with BRE_motif (SO:0000016), which binds transcription factor II B.

C_box [SO_0000622]

An RNA polymerase III type 1 promoter with consensus sequence CAnnCCn.

C_cluster [SO_0000558]

Genomic DNA of immunoglobulin/T-cell receptor gene including more than one C-gene.

C_D_box_snoRNA [SO_0000593]

Most box C/D snoRNAs also contain long (>10 nt) sequences complementary to rRNA. Boxes C and D, as well as boxes C’ and D’, are usually located in close proximity, and form a structure known as the box C/D motif. This motif is important for snoRNA stability, processing, nucleolar targeting and function. A small number of box C/D snoRNAs are involved in rRNA processing; most, however, are known or predicted to serve as guide RNAs in ribose methylation of rRNA. Targeting involves direct base pairing of the snoRNA at the rRNA site to be modified and selection of a rRNA nucleotide a fixed distance from box D or D'.

C_D_box_snoRNA_encoding [SO_0000585]

snoRNA that is associated with guiding methylation of nucleotides. It contains two short conserved sequence motifs: C (RUGAUGA) near the 5-prime end and D (CUGA) near the 3-prime end.

C_D_box_snoRNA_primary_transcript [SO_0000595]

A primary transcript encoding a small nucleolar RNA of the box C/D family.

C_gene_segment [SO_0000478]

Genomic DNA of immunoglobulin/T-cell receptor gene including C-region (and introns if present) with 5’ UTR (SO:0000204) and 3’ UTR (SO:0000205).

C_region [SO_0001834]

The constant region of an immunoglobulin polypeptide sequence.

c_terminal_region [SO_0100015]

The more polar, carboxy-terminal region of the signal peptide (approx 3-7 aa).

C_to_A_transversion [SO_1000019]

A transversion from cytidine to adenine.

C_to_G_transversion [SO_1000020]

A transversion of a cytidine to a guanine.

C_to_T_transition [SO_1000011]

A transition of a cytidine to a thymine.

C_to_T_transition_at_pCpG_site [SO_1000012]

The transition of cytidine to thymine occurring at a pCpG site as a consequence of the spontaneous deamination of 5’-methylcytidine.

CAGE_cluster [SO_0001917]

A kind of transcription_initiation_cluster defined by the clustering of CAGE tags on a sequence region.

CAGE_tag [SO_0001916]

A CAGE tag is a sequence tag hat corresponds to 5’ ends of mRNA at cap sites, produced by cap analysis gene expression and used to identify transcriptional start sites.

candidate_gene [SO_0001867]

A gene suspected of being involved in the expression of a trait. Requested by Bayer Cropscience December, 2011.

canonical_five_prime_splice_site [SO_0000677]

The canonical 5’ splice site has the sequence “GT”.

canonical_splice_site [SO_0000675]

The major class of splice site with dinucleotides GT and AG for donor and acceptor sites, respectively.

canonical_three_prime_splice_site [SO_0000676]

The canonical 3’ splice site has the sequence “AG”.

cap [SO_0000581]

A structure consisting of a 7-methylguanosine in 5’-5’ triphosphate linkage with the first nucleotide of an mRNA. It is added post-transcriptionally, and is not encoded in the DNA.

capped [SO_0000146]

An attribute describing when a sequence, usually an mRNA is capped by the addition of a modified guanine nucleotide at the 5’ end.

capped_mRNA [SO_0000862]

An mRNA that is capped.

capped_primary_transcript [SO_0000861]

A primary transcript that is capped.

CArG_box [SO_0002156]

A promoter element bound by the MADS family of transcription factors with consensus 5’-(C/T)TA(T/A)4TA(G/A)-3’. Requested by Antonia Lock

cassette_array_member [SO_0005847]

A gene that is a member of a gene cassette, which is a mobile genetic element.

cassette_pseudogene [SO_0001434]

A cassette pseudogene is a kind of gene in an inactive form which may recombine at a telomeric locus to form a functional copy. Requested by the Trypanosome community.

catalytic_residue [SO_0001104]

Amino acid involved in the activity of an enzyme. Discrete.

catmat_left_handed_four [SO_0100005]

A motif of 4 consecutive residues with dihedral angles as follows: res i: phi -90 bounds -120 to -60, res i psi -10 bounds -50 to 30, res i+1: phi -90 bounds -120 to -60, res i+1: psi -10 bounds -50 to 30, res i+2: phi -75 bounds -100 to -50, res i+2: psi 140 bounds 110 to 170. The extra restriction of the length of the O to O distance is similar, that it be less than 5 Angstrom. In this case these two Oxygen atoms are the main chain carbonyl oxygen atoms of residues i-1 and i+2.

catmat_left_handed_three [SO_0100004]

A motif of 3 consecutive residues with dihedral angles as follows: res i: phi -90 bounds -120 to -60, res i: psi -10 bounds -50 to 30, res i+1: phi -75 bounds -100 to -50, res i+1: psi 140 bounds 110 to 170. An extra restriction of the length of the O to O distance would be useful, that it be less than 5 Angstrom. More precisely these two oxygens are the main chain carbonyl oxygen atoms of residues i-1 and i+1.

catmat_right_handed_four [SO_0100007]

A motif of 4 consecutive residues with dihedral angles as follows: res i: phi -90 bounds -120 to -60, res i: psi -10 bounds -50 to 30, res i+1: phi -90 bounds -120 to -60, res i+1: psi -10 bounds -50 to 30, res i+2: phi -75 bounds -100 to -50, res i+2: psi 140 bounds 110 to 170. The extra restriction of the length of the O to O distance is similar, that it be less than 5 Angstrom. In this case these two Oxygen atoms are the main chain carbonyl oxygen atoms of residues i-1 and i+2.

catmat_right_handed_three [SO_0100006]

A motif of 3 consecutive residues with dihedral angles as follows: res i: phi -90 bounds -120 to -60, res i: psi -10 bounds -50 to 30, res i+1: phi -75 bounds -100 to -50, res i+1: psi 140 bounds 110 to 170. An extra restriction of the length of the O to O distance would be useful, that it be less than 5 Angstrom. More precisely these two oxygens are the main chain carbonyl oxygen atoms of residues i-1 and i+1.

CCA_tail [SO_0001175]

Base sequence at the 3’ end of a tRNA. The 3’-hydroxyl group on the terminal adenosine is the attachment point for the amino acid.

CCAAT_motif [SO_0001856]

A promoter element with consensus sequence CCAAT, bound by a protein complex that represses transcription in response to low iron levels.

cDNA_clone [SO_0000317]

Complementary DNA; A piece of DNA copied from an mRNA and spliced into a vector for propagation in a suitable host. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

cDNA_match [SO_0000689]

A match against cDNA sequence.

CDRE_motif [SO_0001865]

An RNA polymerase II promoter element found in the promoters of genes regulated by calcineurin. The consensus sequence is GNGGCKCA.

CDS [SO_0000316]

A contiguous sequence which begins with, and includes, a start codon and ends with, and includes, a stop codon.

CDS_extension [SO_0002227]

A sequence variant extending the CDS, that causes elongation of the resulting polypeptide sequence. Added as per request by Edward Wallace GitHub issue #480 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/480)

CDS_five_prime_extension [SO_0002228]

A sequence variant extending the CDS at the 5’ end, that causes elongation of the resulting polypeptide sequence at the N terminus. Added as per request by Edward Wallace GitHub issue #480 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/480)

CDS_fragment [SO_0001384]

A portion of a CDS that is not the complete CDS.

CDS_independently_known [SO_1001246]

A CDS with the evidence status of being independently known.

CDS_predicted [SO_1001254]

A CDS that is predicted.

CDS_region [SO_0000851]

A region of a CDS.

CDS_supported_by_domain_match_data [SO_1001249]

A CDS that is supported by domain similarity.

CDS_supported_by_EST_or_cDNA_data [SO_1001259]

A CDS that is supported by similarity to EST or cDNA data.

CDS_supported_by_peptide_spectrum_match [SO_0002071]

A CDS that is supported by proteomics data.

CDS_supported_by_sequence_similarity_data [SO_1001251]

A CDS that is supported by sequence similarity data.

CDS_three_prime_extension [SO_0002229]

A sequence variant extending the CDS at the 3’ end, that causes elongation of the resulting polypeptide sequence at the C terminus. Added as per request by Edward Wallace GitHub issue #480 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/480)

central_hydrophobic_region_of_signal_peptide [SO_0100016]

The central, hydrophobic region of the signal peptide (approx 7-15 aa).

centromere [SO_0000577]

A region of chromosome where the spindle fibers attach during mitosis and meiosis.

centromere_DNA_Element_I [SO_0001493]

A centromere DNA Element I (CDEI) is a conserved region, part of the centromere, consisting of a consensus region composed of 8-11bp which enables binding by the centromere binding factor 1(Cbf1p). This term was requested 2009-10-16 by Michel Dumontier, tracker id 2880699.

centromere_DNA_Element_II [SO_0001494]

A centromere DNA Element II (CDEII) is part a conserved region of the centromere, consisting of a consensus region that is AT-rich and ~ 75-100 bp in length. This term was requested 2009-10-16 by Michel Dumontier, tracker id 2880699.

centromere_DNA_Element_III [SO_0001495]

A centromere DNA Element I (CDEI) is a conserved region, part of the centromere, consisting of a consensus region that consists of a 25-bp which enables binding by the centromere DNA binding factor 3 (CBF3) complex. This term was requested 2009-10-16 by Michel Dumontier, tracker id 2880699.

centromeric_repeat [SO_0001797]

A repeat region found within the modular centromere.

chimeric_cDNA_clone [SO_0000810]

A cDNA clone invalidated because it is chimeric.

ChIP_seq_region [SO_0001697]

A region of sequence identified by CHiP seq technology to contain a protein binding site.

chloroplast_chromosome [SO_0000820]

A chromosome originating in a chloroplast.

chloroplast_DNA [SO_0001033]

DNA belonging to the genome of a chloroplast, a photosynthetic plastid. This term is used by MO.

chloroplast_DNA_read [SO_0001930]

A sequencer read of a chloroplast DNA sample. Requested by Bayer Cropscience, October, 2012.

chloroplast_sequence [SO_0000745]

DNA belonging to the genome of a chloroplast, a green plastid for photosynthesis.

chromoplast_chromosome [SO_0000821]

A chromosome originating in a chromoplast.

chromoplast_gene [SO_0000093]

A gene from chromoplast_sequence.

chromoplast_sequence [SO_0000744]

DNA belonging to the genome of a chromoplast, a colored plastid for synthesis and storage of pigments.

chromosomal_deletion [SO_1000029]

An incomplete chromosome.

chromosomal_regulatory_element [SO_0000626]

Regions of the chromosome that are important for regulating binding of chromosomes to the nuclear matrix.

chromosomal_structural_element [SO_0000628]

Regions of the chromosome that are important for structural elements.

chromosomal_transposition [SO_0000453]

A chromosome structure variant whereby a region of a chromosome has been transferred to another position. Among interchromosomal rearrangements, the term transposition is reserved for that class in which the telomeres of the chromosomes involved are coupled (that is to say, form the two ends of a single DNA molecule) as in wild-type.

chromosomal_variation_attribute [SO_0001509]

An attribute of a change in the structure or number of a chromosomes.

chromosomally_aberrant_genome [SO_0001524]

When a genome contains an abnormal amount of chromosomes.

chromosome arm [SO_0000105]

A region of the chromosome between the centromere and the telomere. Human chromosomes have two arms, the p arm (short) and the q arm (long) which are separated from each other by the centromere.

chromosome band [SO_0000341]

A cytologically distinguishable feature of a chromosome, often made visible by staining, and usually alternating light and dark. “Band’ is a term of convenience in order to hierarchically organize morphologically defined chromosome features: chromosome > arm > region > band > sub-band.

chromosome_breakage_sequence [SO_0000670]

A sequence within the micronuclear DNA of ciliates at which chromosome breakage and telomere addition occurs during nuclear differentiation.

chromosome_breakpoint [SO_0001021]

A chromosomal region that may sustain a double-strand break, resulting in a recombination event.

chromosome_fission [SO_1000141]

A chromosome that occurred by the division of a larger chromosome.

chromosome_number_variation [SO_1000182]

A kind of chromosome variation where the chromosome complement is not an exact multiple of the haploid number.

chromosome_part [SO_0000830]

A region of a chromosome. This is a manufactured term, that serves the purpose of allow the parts of a chromosome to have an is_a path to the root.

chromosome_structure_variation [SO_1000183]

An alteration of the genome that leads to a change in the structure or number of one or more chromosomes.

chromosome_variation [SO_0000240]

A deviation in chromosome structure or number.

circular [SO_0000988]

A quality of a nucleotide polymer that has no terminal nucleotide residues. Attributes added to describe the different kinds of replicon. SO workshop, September 2006.

circular_double_stranded_DNA_chromosome [SO_0000958]

Structural unit composed of a self-replicating, double-stranded, circular DNA molecule.

circular_double_stranded_RNA_chromosome [SO_0000967]

Structural unit composed of a self-replicating, double-stranded, circular RNA molecule.

circular_single_stranded_DNA_chromosome [SO_0000960]

Structural unit composed of a self-replicating, single-stranded, circular DNA molecule.

circular_single_stranded_RNA_chromosome [SO_0000966]

Structural unit composed of a self-replicating, single-stranded, circular DNA molecule.

cis_acting_homologous_chromosome_pairing_region [SO_0002025]

A genome region where chromosome pairing occurs preferentially during homologous chromosome pairing during early meiotic prophase of Meiosis I. Comment: An example of this is the Sme2 locus in fission yeast S. pombe, where is coincident with an ribonuclear complex termed the “Mei2 dot”. This term was Requested by Val Wood, PomBase.

cis_regulatory_frameshift_element [SO_0001427]

A structural region in an RNA molecule which promotes ribosomal frameshifting of cis coding sequence. Moved from transcription_regulatory_region (SO:0001679) to transcriptional_cis_regulatory_region (SO:0001055) by Dave Sant on Feb 11, 2021 when transcription_regulatory_region was merged into transcriptional_cis_regulatory_region to be consistent with GO and reduce redundancy as part of the GREEKC consortium. See GitHub Issue #527.

cis_regulatory_module [SO_0000727]

A regulatory region where transcription factor binding sites are clustered to regulate various aspects of transcription activities. (CRMs can be located a few kb to hundreds of kb upstream of the core promoter, in the coding sequence, within introns, or in the untranslated regions (UTR) sequences, and even on a different chromosome). A single gene can be regulated by multiple CRMs to give precise control of its spatial and temporal expression. CRMs function as nodes in large, intertwined regulatory network. CRM DNA accessibility is subject to regulation by dbTFs and transcription co-TFs. Requested by Stephen Grossmann Dec 2004. Changed relationship from has_part SO:0000235 TF_binding site to TF_binding_site is part_of SO:0000727 CRM in response to requests from GREEKC initiative in Aug 2020. Removed 3’ from definition because 5’ UTRs are included as well, notified by Colin Logie of GREEKC. Nov 9 2020. DS Updated name from ‘CRM’ to ‘cis_regulatory_module’ on 08 Feb 2021. See GitHub Issue #526. DS Added final sentence to definition as part of GREEKC Feb 16, 2021. See GitHub Issue #534.

cis_splice_site [SO_0001419]

Intronic 2 bp region bordering exon. A splice_site that adjacent_to exon and overlaps intron.

class_I_RNA [SO_0000990]

Small non-coding RNA (55-65 nt long) containing highly conserved 5’ and 3’ ends (16 and 8 nt, respectively) that are predicted to come together to form a stem structure. Identified in the social amoeba Dictyostelium discoideum and localized in the cytoplasm. Requested by Karen Pilcher - Dictybase. song-Term Tracker-1574577.

class_II_RNA [SO_0000989]

Small non-coding RNA (59-60 nt long) containing 5’ and 3’ ends that are predicted to come together to form a stem structure. Identified in the social amoeba Dictyostelium discoideum and localized in the cytoplasm.

cleaved_for_gpi_anchor_region [SO_0001408]

The C-terminal residues of a polypeptide which are exchanged for a GPI-anchor.

cleaved_initiator_methionine [SO_0000691]

The initiator methionine that has been cleaved from a mature polypeptide sequence.

cleaved_peptide_region [SO_0100011]

The cleaved_peptide_region is the region of a peptide sequence that is cleaved during maturation. Range.

clip [SO_0000303]

Part of the primary transcript that is clipped off during processing.

clone_attribute [SO_0000787]

[clone_attribute]

clone_end [SO_0001793]

A read from an end of the clone sequence.

clone_insert [SO_0000753]

The region of sequence that has been inserted and is being propagated by the clone.

clone_insert_end [SO_0000103]

The end of the clone insert.

clone_insert_start [SO_0000179]

The start of the clone insert.

clone_insert_start [SO_0000767]

[clone_insert_start]

cloned [SO_0000788]

[cloned]

cloned_cDNA [SO_0000792]

[cloned_cDNA]

cloned_cDNA_insert [SO_0000913]

A clone insert made from cDNA.

cloned_genomic [SO_0000791]

[cloned_genomic]

cloned_genomic_insert [SO_0000914]

A clone insert made from genomic DNA.

cloned_region [SO_0000785]

The region of sequence that has been inserted and is being propagated by the clone. Added in response to Lynn Crosby. A clone insert may be composed of many cloned regions.

coding_conserved_region [SO_0000332]

Coding region of sequence similarity by descent from a common ancestor.

coding_end [SO_0000327]

The last base to be translated into protein. It does not include the stop codon.

coding_exon [SO_0000195]

An exon whereby at least one base is part of a codon (here, ‘codon’ is inclusive of the stop_codon).

coding_region_of_exon [SO_0001215]

The region of an exon that encodes for protein sequence. An exon containing either a start or stop codon will be partially coding and partially non coding.

coding_sequence_variant [SO_0001580]

A sequence variant that changes the coding sequence.

coding_start [SO_0000323]

The first base to be translated into protein.

coding_transcript_intron_variant [SO_0001969]

A transcript variant occurring within an intron of a coding transcript.

coding_transcript_variant [SO_0001968]

A transcript variant of a protein coding gene.

coding_transcript_with_retained_intron [SO_0002112]

A protein coding transcript containing a retained intron. Term added as part of collaboration with Gencode, adding biotypes used in annotation.

coding_variant_quality [SO_0001814]

An attribute of a coding genomic variant.

codon [SO_0000360]

A set of (usually) three nucleotide bases in a DNA or RNA sequence, which together code for a unique amino acid or the termination of translation and are contained within the CDS.

codon_redefined [SO_0000882]

An attribute describing the alteration of codon meaning.

cointegrated_plasmid [SO_0001045]

A MGE region consisting of two fused plasmids resulting from a replicative transposition event.

common_variant [SO_0001767]

When a variant from the genomic sequence is commonly found in the general population.

compensatory_transcript_secondary_structure_variant [SO_0001597]

A secondary structure variant that compensate for the change made by a previous variant.

complex_3D_structural_variant [SO_0001600]

A sequence variant that changes the resulting polypeptide structure.

complex_change_of_translational_product_variant [SO_0001602]

A variant that changes the translational product with respect to the reference.

complex_chromosomal_rearrangement [SO_0002062]

A contiguous cluster of translocations, usually the result of a single catastrophic event such as chromothripsis or chromoanasynthesis.

complex_structural_alteration [SO_0001784]

A structural sequence alteration or rearrangement encompassing one or more genome fragments, with 4 or more breakpoints.

complex_substitution [SO_1000005]

When no simple or well defined DNA mutation event describes the observed DNA change, the keyword “complex” should be used. Usually there are multiple equally plausible explanations for the change.

complex_transcript_variant [SO_0001577]

A transcript variant with a complex INDEL- Insertion or deletion that spans an exon/intron border or a coding sequence/UTR border. EBI term: Complex InDel - Insertion or deletion that spans an exon/intron border or a coding sequence/UTR border.

compositionally_biased_region_of_peptide [SO_0001066]

Polypeptide region that is rich in a particular amino acid or homopolymeric and greater than three residues in length. Range.

compound_chromosome [SO_1000042]

A chromosome structure variant where a monocentric element is caused by the fusion of two chromosome arms.

compound_chromosome_arm [SO_0000060]

One arm of a compound chromosome. FLAG - this term is should probably be a part of rather than an is_a.

computed_feature [SO_0000309]

[computed_feature]

computed_feature_by_similarity [SO_0000311]

. similar to:<sequence_id>

conformational_change_variant [SO_0001601]

A sequence variant in the CDS region that causes a conformational change in the resulting polypeptide sequence.

conformational_switch [SO_0001422]

A region of a polypeptide, involved in the transition from one conformational state to another. MM Young, K Kirshenbaum, KA Dill & S Highsmith. Predicting conformational switches in proteins. Protein Science, 1999, 8, 1752-64. K. Kirshenbaum, M.M. Young and S. Highsmith. Predicting Allosteric Switches in Myosins. Protein Science 8(9):1806-1815. 1999.

conjugative_transposon [SO_0000371]

A transposon that encodes function required for conjugation.

consensus [SO_0000993]

A sequence produced from an aligment algorithm that uses multiple sequences as input. Term added Dec 06 to comply with mapping to MGED terms. It should be used to generate consensus regions. The specific cross product terms they require are consensus_region and consensus_mRNA.

consensus_AFLP_fragment [SO_0001991]

A consensus AFLP fragment is an AFLP sequence produced from any alignment algorithm which uses assembled multiple AFLP sequences as input. Requested by Bayer Cropscience September, 2013.

consensus_gDNA [SO_0001931]

Genomic DNA sequence produced from some base calling or alignment algorithm which uses aligned or assembled multiple gDNA sequences as input. Requested by Bayer Cropscience November, 2012.

consensus_mRNA [SO_0000995]

An mRNA sequence produced from an aligment algorithm that uses multiple sequences as input. DO not obsolete without considering MGED mapping.

consensus_region [SO_0000994]

A region that has a known consensus sequence. DO not obsolete without considering MGED mapping.

conservative_amino_acid_substitution [SO_0001607]

A sequence variant of a codon causing the substitution of a similar amino acid for another in the resulting polypeptide.

conservative_inframe_deletion [SO_0001825]

An inframe decrease in cds length that deletes one or more entire codons from the coding sequence but does not change any remaining codons.

conservative_inframe_insertion [SO_0001823]

An inframe increase in cds length that inserts one or more codons into the coding sequence between existing codons.

conservative_missense_variant [SO_0001585]

A sequence variant whereby at least one base of a codon is changed resulting in a codon that encodes for a different but similar amino acid. These variants may or may not be deleterious.

conserved [SO_0000856]

A region that is similar or identical across more than one species.

conserved_intergenic_variant [SO_0002017]

A sequence variant located in a conserved intergenic region, between genes. Requested by Uma Paila (UVA) for snpEff.

conserved_intron_variant [SO_0002018]

A transcript variant occurring within a conserved region of an intron. Requested by Uma Paila (UVA) for snpEff.

conserved_region [SO_0000330]

Region of sequence similarity by descent from a common ancestor.

constitutive_promoter [SO_0002050]

A promoter that allows for continual transcription of gene.

contig_collection [SO_0001462]

A collection of contigs. See tracker ID: 2138359.

contig_read [SO_0000476]

A DNA sequencer read which is part of a contig.

copy_number_change [SO_0001563]

A sequence variant where copies of a feature (CNV) are either increased or decreased.

copy_number_decrease [SO_0001912]

A sequence variant where copies of a feature are decreased relative to the reference.

copy_number_gain [SO_0001742]

A sequence alteration whereby the copy number of a given regions is greater than the reference sequence.

copy_number_increase [SO_0001911]

A sequence variant where copies of a feature are increased relative to the reference.

copy_number_loss [SO_0001743]

A sequence alteration whereby the copy number of a given region is less than the reference sequence.

copy_number_variation [SO_0001019]

A variation that increases or decreases the copy number of a given region.

core_eukaryotic_promoter_element [SO_0001660]

An element that only exists within the promoter region of a eukaryotic gene.

core_promoter_element [SO_0002309]

An element that always exists within the promoter region of a gene. When multiple transcripts exist for a gene, the separate transcripts may have separate core_promoter_elements. Added by Dave to be consistent with other ontologies updated with GREEKC initiative.

cosmid [SO_0000156]

A cloning vector that is a hybrid of lambda phages and a plasmid that can be propagated as a plasmid or packaged as a phage,since they retain the lambda cos sites. Paper: vans GA et al. High efficiency vectors for cosmid microcloning and genomic analysis. Gene 1989; 79(1):9-20. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

cosmid_clone [SO_0000765]

[cosmid_clone]

covalent_binding_site [SO_0001090]

Binding involving a covalent bond.

CRE [SO_0001843]

MERGED DEFINITION: TARGET DEFINITION: A promoter element with consensus sequence TGACGTCA; bound by the ATF/CREB family of transcription factors. ——————– SOURCE DEFINITION: A promoter element that contains a core sequence TGACGT, bound by a protein complex that regulates transcription of genes encoding PKA pathway components. New synonym Atf1/Pcr1 recognition motif added in response to Antonia Lock GitHub Issue Request #437, PMID:15716492

CRISPR [SO_0001459]

Clustered Palindromic Repeats interspersed with bacteriophage derived spacer sequences.

cross_genome_match [SO_0000177]

A nucleotide match against a sequence from another organism.

cross_link [SO_0001087]

Posttranslationally formed amino acid bonds.

cryptic [SO_0000976]

A feature_attribute describing a feature that is not manifest under normal conditions.

cryptic_gene [SO_0001431]

A gene that is not transcribed under normal conditions and is not critical to normal cellular functioning.

cryptic_prophage [SO_0001007]

A remnant of an integrated prophage in the host genome or an “island” in the host genome that includes phage like-genes. This is not cryptic in the same sense as a cryptic gene or cryptic splice site.

cryptic_splice_acceptor [SO_0001570]

A sequence variant whereby a new splice site is created due to the activation of a new acceptor.

cryptic_splice_donor [SO_0001571]

A sequence variant whereby a new splice site is created due to the activation of a new donor.

cryptic_splice_site [SO_0001533]

A splice site that is in part of the transcript not normally spliced. They occur via mutation or transcriptional error.

cryptic_splice_site_variant [SO_0001569]

A sequence variant causing a new (functional) splice site.

cryptogene [SO_1001196]

A maxicircle gene so extensively edited that it cannot be matched to its edited mRNA sequence.

CSL_response_element [SO_0001839]

A promoter element with consensus sequence GTGRGAA, bound by CSL (CBF1/RBP-JK/Suppressor of Hairless/LAG-1) transcription factors.

CsrB_RsmB_RNA [SO_0000377]

An enterobacterial RNA that binds the CsrA protein. The CsrB RNAs contain a conserved motif CAGGXXG that is found in up to 18 copies and has been suggested to bind CsrA. The Csr regulatory system has a strong negative regulatory effect on glycogen biosynthesis, glyconeogenesis and glycogen catabolism and a positive regulatory effect on glycolysis. In other bacteria such as Erwinia caratovara the RsmA protein has been shown to regulate the production of virulence determinants, such extracellular enzymes. RsmA binds to RsmB regulatory RNA which is also a member of this family.

ct_gene [SO_0000092]

A gene from chloroplast sequence.

CTCF_binding_site [SO_0001974]

A transcription factor binding site with consensus sequence CCGCGNGGNGGCAG, bound by CCCTF-binding factor.

CTG_start_codon [SO_1001273]

A non-canonical start codon of sequence CTG.

CuRE [SO_0001844]

A promoter element bound by copper ion-sensing transcription factors such as S. cerevisiae Mac1p or S. pombe Cuf1; the consensus sequence is HTHNNGCTGD (more specifically TTTGCKCR in budding yeast).

cyanelle_chromosome [SO_0000822]

A chromosome originating in a cyanelle.

cyanelle_gene [SO_0000094]

A gene from cyanelle sequence.

cyanelle_sequence [SO_0000746]

DNA belonging to the genome of a cyanelle, a photosynthetic plastid found in algae.

cyclic_translocation [SO_1000150]

A chromosomal translocation whereby three breaks occurred in three different chromosomes. The centric segment resulting from the first break listed is joined to the acentric segment resulting from the second, rather than the third.

cysteine [SO_0001447]

A polar amino acid encoded by the codons TGT and TGC. A place holder for a cross product with chebi.

cysteine_tRNA_primary_transcript [SO_0000215]

A primary transcript encoding cysteinyl tRNA (SO:0000258).

cysteinyl_tRNA [SO_0000258]

A tRNA sequence that has a cysteine anticodon, and a 3’ cysteine binding region.

cytoplasmic_polypeptide_region [SO_0001073]

Polypeptide region that is localized inside the cytoplasm.

cytosolic_16S_rRNA [SO_0001000]

Cytosolic 16S rRNA is an RNA component of the small subunit of cytosolic ribosomes in prokaryotes. Renamed to cytosolic_16S_rRNA from rRNA_16S on 10 June 2021 as per restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Request from EBI. See GitHub Issue #493.

cytosolic_23S_rRNA [SO_0001001]

Cytosolic 23S rRNA is an RNA component of the large subunit of cytosolic ribosomes in prokaryotes. Renamed from rRNA_23S to cytosolic_23S_rRNA on 27 May 2021 with the restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Requested by EBI. See GitHub Issue #493.

cytosolic_5S_rRNA [SO_0000652]

Cytosolic 5S rRNA is an RNA component of the large subunit of cytosolic ribosomes in both prokaryotes and eukaryotes. Renamed from rRNA_5S to cytosolic_5S_rRNA on 27 May 2021 with the restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Requested by EBI. See GitHub Issue #493.

cytosolic_LSU_rRNA [SO_0000651]

Cytosolic LSU rRNA is an RNA component of the large subunit of cytosolic ribosomes. Renamed to cytosolic_LSU_rRNA from large_subunit_rRNA on 10 June 2021 as per restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Request from EBI. See GitHub Issue #493.

cytosolic_LSU_rRNA_gene [SO_0002361]

A gene that codes for cytosolic LSU rRNA. Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_rRNA [SO_0002343]

Cytosolic rRNA is an RNA component of the small or large subunits of cytosolic ribosomes. Added as a request from EBI. See GitHub Issue #493

cytosolic_rRNA_16S_gene [SO_0002237]

A gene which codes for 16S_rRNA, which functions as the small subunit of the ribosome in prokaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_rRNA_18S_gene [SO_0002236]

A gene which codes for 18S_rRNA, which functions as the small subunit of the ribosome in eukaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_rRNA_23S_gene [SO_0002243]

A gene which codes for 23S_rRNA, which functions as a component of the large subunit of the ribosome in prokaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_rRNA_25S_gene [SO_0002242]

A gene which codes for 25S_rRNA, which functions as a component of the large subunit of the ribosome in some eukaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_rRNA_28S_gene [SO_0002239]

A gene which codes for 28S_rRNA, which functions as a component of the large subunit of the ribosome in eukaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_rRNA_5_8S_gene [SO_0002240]

A gene which codes for 5_8S_rRNA (5.8S rRNA), which functions as a component of the large subunit of the ribosome in eukaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_rRNA_5S_gene [SO_0002238]

A gene which codes for 5S_rRNA, which is a portion of the large subunit of the ribosome in both eukaryotes and prokaryotes. Added as per request by Antonia Lock GitHub issue #472 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/472). Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_rRNA_gene [SO_0002360]

A gene that codes for cytosolic rRNA. Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

cytosolic_SSU_rRNA [SO_0000650]

Cytosolic SSU rRNA is an RNA component of the small subunit of cytosolic ribosomes. Renamed to cytosolic_SSU_rRNA from small_subunit_rRNA on 10 June 2021 as per restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Request from EBI. See GitHub Issue #493.

cytosolic_SSU_rRNA_gene [SO_0002362]

A gene that codes for cytosolic SSU rRNA. Adjusted heirarchy and names of rRNA_gene terms at the request of Steven Marygold GitHub Issue #513 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/513).

D_cluster [SO_0000559]

Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including more than one D-gene.

D_DJ_C_cluster [SO_0000504]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one D-gene, one DJ-gene and one C-gene.

D_DJ_cluster [SO_0000505]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one D-gene and one DJ-gene.

D_DJ_J_C_cluster [SO_0000506]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one D-gene, one DJ-gene, one J-gene and one C-gene.

D_DJ_J_cluster [SO_0000508]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one D-gene, one DJ-gene, and one J-gene.

D_gene_recombination_feature [SO_0000492]

Recombination signal including D-heptamer, D-spacer and D-nonamer in 5’ of D-region of a D-gene or D-sequence.

D_gene_segment [SO_0000458]

Germline genomic DNA including D-region with 5’ UTR and 3’ UTR, also designated as D-segment.

D_J_C_cluster [SO_0000509]

Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including at least one D-gene, one J-gene and one C-gene.

D_J_cluster [SO_0000560]

Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including at least one D-gene and one J-gene.

DArT_marker [SO_0001646]

A genetic marker, discovered using Diversity Arrays Technology (DArT) technology.

databank_entry [SO_2000061]

The sequence referred to by an entry in a databank such as GenBank or SwissProt.

dCAPS_primer [SO_0001699]

A primer with one or more mismatches to the DNA template corresponding to a position within a restriction enzyme recognition site.

DCE [SO_0001664]

A discontinuous core element of RNA polymerase II transcribed genes, situated downstream of the TSS. It is composed of three sub elements: SI, SII and SIII.

DCE_SI [SO_0001665]

A sub element of the DCE core promoter element, with consensus sequence CTTC.

DCE_SII [SO_0001666]

A sub element of the DCE core promoter element with consensus sequence CTGT.

DCE_SIII [SO_0001667]

A sub element of the DCE core promoter element with consensus sequence AGC.

DDB_box [SO_0001804]

A conserved polypeptide motif that mediates protein-protein interaction and defines adaptor proteins for DDB1/cullin 4 ubiquitin ligases. Note: PMID:18794354 describes the DDB box, and has lots of alignments, but doesn’t actually come out with a consensus sequence.

de_novo_variant [SO_0001781]

A variant arising in the offspring that is not found in either of the parents.

decayed_exon [SO_0000464]

A non-functional descendant of an exon. Does not have to be part of a pseudogene.

decreased_polyadenylation_variant [SO_0001803]

A transcript processing variant whereby polyadenylation of the encoded transcript is decreased with respect to the reference. Term requested by M. Dumontier, June 1 2011.

decreased_transcript_level_variant [SO_0001541]

A sequence variant that decreases the level of mature, spliced and processed RNA with respect to a reference sequence.

decreased_transcript_stability_variant [SO_0001547]

A sequence variant that decreases transcript stability with respect to a reference sequence.

decreased_transcription_rate_variant [SO_0001552]

A sequence variant that decreases the rate of transcription with respect to a reference sequence.

decreased_translational_product_level [SO_0001555]

A sequence variant which decreases the translational product level with respect to a reference sequence.

defective_conjugative_transposon [SO_0001049]

An island that contains genes for integration/excision and the gene and site for the initiation of intercellular transfer by conjugation. It can be complemented for transfer by a conjugative transposon.

deficient_interchromosomal_transposition [SO_0000063]

An interchromosomal transposition whereby a translocation in which one of the four broken ends loses a segment before re-joining.

deficient_intrachromosomal_transposition [SO_0000062]

An intrachromosomal transposition whereby a translocation in which one of the four broken ends loses a segment before re-joining.

deficient_inversion [SO_1000171]

A chromosomal deletion whereby three breaks occur in the same chromosome; one central region is lost, and the other is inverted.

deficient_translocation [SO_1000147]

A chromosomal deletion whereby a translocation occurs in which one of the four broken ends loses a segment before re-joining.

delete [SO_0000045]

To remove a subsection of sequence.

delete_U [SO_0000918]

An edit to delete a uridine. The insertion and deletion of uridine (U) residues, usually within coding regions of mRNA transcripts of cryptogenes in the mitochondrial genome of kinetoplastid protozoa.

deletion [SO_0000159]

The point at which one or more contiguous nucleotides were excised.

deletion_breakpoint [SO_0001415]

The point within a chromosome where a deletion begins or ends.

deletion_junction [SO_0000687]

The space between two bases in a sequence which marks the position where a deletion has occurred.

delins [SO_1000032]

A sequence alteration which included an insertion and a deletion, affecting 2 or more bases. Indels can have a different number of bases than the corresponding reference sequence. The term name was changed from indel to delins on 2/24/2019 to align with the HGVS nomenclature term for a deletion-insertion. Indel was causing confusion in the annotation community (github issue 445). The HGVS nomenclature definition of deletion-insertion (delins) is a sequence change where, compared to a reference sequence, one or more nucleotides are replaced by one or more other nucleotides and which is not a substitution, inversion or conversion. Indels can have a different number of bases than the corresponding reference sequence.

designed_sequence [SO_0000546]

An oligonucleotide sequence that was designed by an experimenter that may or may not correspond with any natural sequence.

destruction_box [SO_0001805]

A conserved polypeptide motif that can be recognized by both Fizzy/Cdc20- and FZR/Cdh1-activated anaphase-promoting complex/cyclosome (APC/C) and targets a protein for ubiquitination and subsequent degradation by the APC/C. The consensus sequence is RXXLXXXXN.

dextrosynaptic_chromosome [SO_1000142]

An autosynaptic chromosome carrying the two right (D = dextro) telomeres. Corrected spelling from dexstrosynaptic_chromosome to dextrosynaptic_chromosome on April 14, 2020 in response to GitHub request #447

dg_repeat [SO_0001898]

A repeat region which is part of the regional centromere outer repeat region. For the S. pombe project - requested by Val Wood.

dh_repeat [SO_0001899]

A repeat region which is part of the regional centromere outer repeat region. For the S. pombe project - requested by Val Wood.

DHU_loop [SO_0001176]

Non-base-paired sequence of nucleotide bases in tRNA. It contains several dihydrouracil residues.

dicistronic [SO_0000879]

An attribute describing a sequence that contains the code for two gene products.

dicistronic_mRNA [SO_0000716]

An mRNA that has the quality dicistronic.

dicistronic_primary_transcript [SO_1001197]

A primary transcript that has the quality dicistronic.

dicistronic_transcript [SO_0000079]

A transcript that is dicistronic.

dif_site [SO_0000949]

A site at which replicated bacterial circular chromosomes are decatenated by site specific resolvase.

dihydrouridine [SO_0001228]

A modified RNA base in which the 5,6-dihydrouracil is bound to the ribose ring.

dinucleotide_repeat_microsatellite_feature [SO_0000290]

A region of a repeating dinucleotide sequence (two bases).

diplotype [SO_0001028]

A diplotype is a pair of haplotypes from a given individual. It is a genotype where the phase is known.

direct [SO_0001514]

A quality of an insertion where the insert is not in a cytologically inverted orientation.

direct_tandem_duplication [SO_1000039]

A tandem duplication where the individual regions are in the same orientation.

direction_attribute [SO_0001029]

The attribute of whether the sequence is the same direction as a feature (forward) or the opposite direction as a feature (reverse).

disabled_reading_frame [SO_0002048]

A reading frame that could encode a full-length protein but which contains obvious mid-sequence disablements (frameshifts or premature stop codons).

disease_associated_variant [SO_0001771]

A variant that has been found to be associated with disease.

disease_causing_variant [SO_0001772]

A variant that has been found to cause disease.

disruptive_inframe_deletion [SO_0001826]

An inframe decrease in cds length that deletes bases from the coding sequence starting within an existing codon.

disruptive_inframe_insertion [SO_0001824]

An inframe increase in cds length that inserts one or more codons into the coding sequence within an existing codon.

distal_duplication [SO_0001928]

A duplication of the distal region of a chromosome. This term is used by Complete Genomics in the structural variant analysis files.

distal_promoter_element [SO_0001670]

A regulatory promoter element that is distal from the TSS.

distant_three_prime_recoding_signal [SO_1001287]

A recoding signal that is found many hundreds of nucleotides 3’ of a redefined stop codon.

disulfide_bond [SO_0001088]

The covalent bond between sulfur atoms that binds two peptide chains or different parts of one peptide chain and is a structural determinant in many protein molecules. 2 discreet & joined.

DJ_C_cluster [SO_0000539]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one DJ-gene and one C-gene.

DJ_gene_segment [SO_0000572]

Genomic DNA of immunoglobulin/T-cell receptor gene in partially rearranged genomic DNA including D-J-region with 5’ UTR and 3’ UTR, also designated as D-J-segment.

DJ_J_C_cluster [SO_0000540]

Genomic DNA in rearranged configuration including at least one D-J-GENE, one J-GENE and one C-GENE.

DJ_J_cluster [SO_0000485]

Genomic DNA of immunoglobulin/T-cell receptor gene in rearranged configuration including at least one DJ-gene, and one J-gene.

DMv1_motif [SO_0001165]

A promoter motif with consensus sequence CARCCCT.

DMv2_motif [SO_0001161]

A sequence element characteristic of some RNA polymerase II promoters, usually located between -60 and -45 relative to the TSS. Consensus sequence is MKSYGGCARCGSYSS. Tends to co-occur with DMv3 (SO:0001160). Tends to not occur with DPE motif (SO:0000015) or MTE (SO:0001162).

DMv3_motif [SO_0001160]

A sequence element characteristic of some RNA polymerase II promoters, usually located between -30 and +15 relative to the TSS. Consensus sequence is KNNCAKCNCTRNY. Tends to co-occur with DMv2 (SO:0001161). Tends to not occur with DPE motif (SO:0000015) or MTE (0001162).

DMv4_motif [SO_0001157]

A sequence element characteristic of some RNA polymerase II promoters, located immediately upstream of some TATA box elements with respect to the TSS (+1). Consensus sequence is YGGTCACACTR. Marked spatial preference within core promoter; tend to occur near the TSS, although not as tightly as INR (SO:0000014).

DMv5_motif [SO_0001159]

A sequence element characteristic of some RNA polymerase II promoters, usually located between -50 and -10 relative to the TSS. Consensus sequence is KTYRGTATWTTT. Tends to co-occur with DMv4 (SO:0001157) . Tends to not occur with DPE motif (SO:0000015) or MTE (SO:0001162).

DNA [SO_0000352]

An attribute describing a sequence consisting of nucleobases bound to a repeating unit made of a 2-deoxy-D-ribose ring connected to a phosphate backbone.

DNA_aptamer [SO_0000032]

DNA molecules that have been selected from random pools based on their ability to bind other molecules.

DNA_binding_site [SO_0001429]

A binding site that, in the molecule, interacts selectively and non-covalently with DNA.

DNA_chromosome [SO_0000954]

Structural unit composed of a self-replicating, DNA molecule.

DNA_constraint_sequence [SO_0001009]

A double-stranded DNA used to control macromolecular structure and function.

DNA_invertase_target_sequence [SO_0000660]

[DNA_invertase_target_sequence]

DNA_replication_mode [SO_0000971]

This has been obsoleted as it represents a process. replaced_by: GO:0006260. [DNA replication mode; DNA_replication_mode]

DNA_sequence_secondary_structure [SO_0000142]

A folded DNA sequence.

DNA_transposon [SO_0000182]

A transposon where the mechanism of transposition is via a DNA intermediate.

DNaseI_hypersensitive_site [SO_0000685]

DNA region representing open chromatin structure that is hypersensitive to digestion by DNase I.

DNAzyme [SO_0001012]

A DNA sequence with catalytic activity. Added by request from Colin Batchelor.

dominant_negative_variant [SO_0002052]

A variant where the mutated gene product adversely affects the other (wild type) gene product. Requested by Deanna Church.

double [SO_0000985]

When a nucleotide polymer has two strands that are reverse-complement to one another and pair together. Attributes added to describe the different kinds of replicon. SO workshop, September 2006.

double_stranded_cDNA [SO_0000758]

DNA synthesized from RNA by reverse transcriptase that has been copied by PCR to make it double stranded.

double_stranded_DNA_chromosome [SO_0000955]

Structural unit composed of a self-replicating, double-stranded DNA molecule.

double_stranded_RNA_chromosome [SO_0000965]

Structural unit composed of a self-replicating, double-stranded RNA molecule.

downstream_gene_variant [SO_0001632]

A sequence variant located 3’ of a gene. Different groups annotate up and downstream to different lengths. The subtypes are specific and are backed up with cross references.

downstream_transcript_variant [SO_0001987]

A feature variant, where the alteration occurs downstream of the transcript termination site. Requested by Graham Ritchie, EBI/Sanger.

DPE_motif [SO_0000015]

A sequence element characteristic of some RNA polymerase II promoters; Positioned from +28 to +32 with respect to the TSS (+1). Experimental results suggest that the DPE acts in conjunction with the INR_motif to provide a binding site for TFIID in the absence of a TATA box to mediate transcription of TATA-less promoters. Consensus sequence (A|G)G(A|T)(C|T)(G|A|C). Binds TAF6, TAF9.

DPE1_motif [SO_0001164]

A promoter motif with consensus sequence CGGACGT.

DRE [SO_0001845]

A promoter element with consensus sequence CGWGGWNGMM, bound by transcription factors related to RecA and found in promoters of genes expressed following several types of DNA damage or inhibition of DNA synthesis.

DRE_motif [SO_0001156]

A sequence element characteristic of some RNA polymerase II promoters, usually located between -10 and -60 relative to the TSS. Consensus sequence is WATCGATW. This consensus sequence was identified computationally using the MEME algorithm within core promoter sequences from -60 to +40, with an E value of 1.7e-183. Tends to co-occur with Motif 7. Tends to not occur with DPE motif (SO:0000015) or motif 10.

ds_DNA_viral_sequence [SO_0001198]

A ds_DNA_viral_sequence is a viral_sequence that is the sequence of a virus that exists as double stranded DNA.

ds_oligo [SO_0000442]

A double stranded oligonucleotide. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

ds_RNA_viral_sequence [SO_0001169]

A ds_RNA_viral_sequence is a viral_sequence that is the sequence of a virus that exists as double stranded RNA.

DSR_motif [SO_0002005]

The determinant of selective removal (DSR) motif consists of repeats of U(U/C)AAAC. The motif targets meiotic transcripts for removal during mitosis via the exosome. Requested by Antonia Locke, (Pombe).

DsrA_RNA [SO_0000378]

DsrA RNA regulates both transcription, by overcoming transcriptional silencing by the nucleoid-associated H-NS protein, and translation, by promoting efficient translation of the stress sigma factor, RpoS. These two activities of DsrA can be separated by mutation: the first of three stem-loops of the 85 nucleotide RNA is necessary for RpoS translation but not for anti-H-NS action, while the second stem-loop is essential for antisilencing and less critical for RpoS translation. The third stem-loop, which behaves as a transcription terminator, can be substituted by the trp transcription terminator without loss of either DsrA function. The sequence of the first stem-loop of DsrA is complementary with the upstream leader portion of RpoS messenger RNA, suggesting that pairing of DsrA with the RpoS message might be important for translational regulation.

duplicated_pseudogene [SO_0001758]

A pseudogene that arose via gene duplication. Generally duplicated pseudogenes have the same structure as the original gene, including intron-exon structure and some regulatory sequence.

Duplication [SO_1000035]

One or more nucleotides are added between two adjacent nucleotides in the sequence; the inserted sequence derives from, or is identical in sequence to, nucleotides adjacent to insertion point.

duplication_attribute [SO_0001523]

An attribute of a duplication, which is an insertion which derives from, or is identical in sequence to, nucleotides present at a known location in the genome.

dye_terminator_read [SO_0001423]

A read produced by the dye terminator method of sequencing.

E_box_motif [SO_0001158]

A sequence element characteristic of some RNA polymerase II promoters, usually located between -60 and +1 relative to the TSS. Consensus sequence is AWCAGCTGWT. Tends to co-occur with DMv2 (SO:0001161). Tends to not occur with DPE motif (SO:0000015).

early_origin_of_replication [SO_0002140]

An origin of replication that initiates early in S phase.

edit_operation [SO_0000916]

[edit_operation; edit operation]

edited [SO_0000116]

An attribute describing a sequence that is modified by editing.

edited_by_A_to_I_substitution [SO_0000600]

[edited_by_A_to_I_substitution]

edited_by_C_insertion_and_dinucleotide_insertion [SO_0000598]

[transcript_edited_by_C-insertion_and_dinucleotide_insertion; edited_by_C_insertion_and_dinucleotide_insertion]

edited_by_C_to_U_substitution [SO_0000599]

[edited_by_C_to_U_substitution]

edited_by_G_addition [SO_0000601]

[edited_by_G_addition]

edited_CDS [SO_0000935]

A CDS that is edited.

edited_mRNA [SO_0000929]

An mRNA that is edited.

edited_transcript [SO_0000873]

A transcript that is edited.

edited_transcript_by_A_to_I_substitution [SO_0000874]

A transcript that has been edited by A to I substitution.

edited_transcript_feature [SO_0000579]

A locatable feature on a transcript that is edited.

editing_block [SO_0000604]

Edited mRNA sequence mediated by a single guide RNA (SO:0000602).

editing_domain [SO_0000606]

Edited mRNA sequence mediated by two or more overlapping guide RNAs (SO:0000602).

editing_variant [SO_0001544]

A transcript processing variant whereby the process of editing is disrupted with respect to the reference.

eight_cutter_restriction_site [SO_0000251]

[eight_cutter_restriction_site; eight-cutter_restriction_site; 8-cutter_restriction_site]

elongated_in_frame_polypeptide_C_terminal [SO_0001612]

A sequence variant with in the CDS that causes in frame elongation of the resulting polypeptide sequence at the C terminus.

elongated_in_frame_polypeptide_N_terminal_elongation [SO_0001614]

A sequence variant with in the CDS that causes in frame elongation of the resulting polypeptide sequence at the N terminus.

elongated_out_of_frame_polypeptide_C_terminal [SO_0001613]

A sequence variant with in the CDS that causes out of frame elongation of the resulting polypeptide sequence at the C terminus.

elongated_out_of_frame_polypeptide_N_terminal [SO_0001615]

A sequence variant with in the CDS that causes out of frame elongation of the resulting polypeptide sequence at the N terminus.

elongated_polypeptide [SO_0001609]

An elongation of a polypeptide sequence deriving from a sequence variant extending the CDS.

elongated_polypeptide_C_terminal [SO_0001610]

An elongation of a polypeptide sequence at the C terminus deriving from a sequence variant extending the CDS.

elongated_polypeptide_N_terminal [SO_0001611]

An elongation of a polypeptide sequence at the N terminus deriving from a sequence variant extending the CDS.

encodes_1_polypeptide [SO_1001188]

A gene that is alternately spliced, but encodes only one polypeptide.

encodes_alternate_transcription_start_sites [SO_0001241]

A gene that has multiple possible transcription start sites.

encodes_alternately_spliced_transcripts [SO_0000463]

A gene that encodes more than one transcript.

encodes_different_polypeptides_different_stop [SO_1001190]

A gene that is alternately spliced, and encodes more than one polypeptide, that have overlapping peptide sequences, but use different stop codons.

encodes_disjoint_polypeptides [SO_1001192]

A gene that is alternately spliced, and encodes more than one polypeptide, that do not have overlapping peptide sequences.

encodes_greater_than_1_polypeptide [SO_1001189]

A gene that is alternately spliced, and encodes more than one polypeptide.

encodes_overlapping_peptides [SO_1001195]

A gene that is alternately spliced, and encodes more than one polypeptide, that have overlapping peptide sequences.

encodes_overlapping_peptides_different_start [SO_1001191]

A gene that is alternately spliced, and encodes more than one polypeptide, that have overlapping peptide sequences, but use different start codons.

encodes_overlapping_polypeptides_different_start_and_stop [SO_1001193]

A gene that is alternately spliced, and encodes more than one polypeptide, that have overlapping peptide sequences, but use different start and stop codons.

end_overlapping_gene [SO_0000072]

[end_overlapping_gene]

endogenous_retroviral_gene [SO_0000100]

A proviral gene with origin endogenous retrovirus.

endogenous_retroviral_sequence [SO_0000903]

Endogenous DNA sequence that are likely to have arisen from retroviruses.

Endogenous_Retrovirus_LTR_retrotransposon [SO_0002268]

Endogenous retrovirus (ERV) retrotransposons are abundant in the genomes of jawed vertebrates. Human ERVs (HERVs) are classified based on their homologies to animal retroviruses. Class I families are similar in sequence to mammalian Gammaretroviruses (type C) and Epsilonretroviruses (Type E). Class II families show homology to mammalian Betaretroviruses (Type B) and Deltaretroviruses (Type D). F-Class III families are similar to foamy viruses. Added as per GitHub Issue Request #488 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/488)

endonuclease_spliced_intron [SO_0001216]

An intron that spliced via endonucleolytic cleavage and ligation rather than transesterification.

endosomal_localization_signal [SO_0001529]

A polypeptide region that targets a polypeptide to the endosome.

engineered [SO_0000783]

An attribute to describe a region that was modified in vitro.

engineered_DNA [SO_0000793]

[engineered_DNA]

engineered_episome [SO_0000779]

An episome that is engineered. Requested by Lynn Crosby Jan 2006.

engineered_foreign_gene [SO_0000281]

A gene that is engineered and foreign.

engineered_foreign_region [SO_0000805]

A region that is engineered and foreign.

engineered_foreign_repetitive_element [SO_0000293]

A repetitive element that is engineered and foreign.

engineered_foreign_transposable_element [SO_0000799]

A transposable_element that is engineered and foreign.

engineered_foreign_transposable_element_gene [SO_0000283]

A transposable_element that is engineered and foreign.

engineered_fusion_gene [SO_0000288]

A fusion gene that is engineered.

engineered_gene [SO_0000280]

A gene that is engineered.

engineered_insert [SO_0000915]

A clone insert that is engineered.

engineered_plasmid [SO_0000637]

A plasmid that is engineered.

engineered_region [SO_0000804]

A region that is engineered.

engineered_rescue_region [SO_0000794]

A rescue region that is engineered.

engineered_tag [SO_0000807]

A tag that is engineered.

engineered_transposable_element [SO_0000798]

TE that has been modified by manipulations in vitro.

enhanceosome [SO_0001057]

[enhanceosome]

enhancer [SO_0000165]

A cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter. An enhancer may participate in an enhanceosome GO:0034206. A protein-DNA complex formed by the association of a distinct set of general and specific transcription factors with a region of enhancer DNA. The cooperative assembly of an enhanceosome confers specificity of transcriptional regulation. This comment is a place holder should we start to make cross products with GO.

enhancer_attribute [SO_0000402]

[enhancer_attribute]

enhancer_binding_site [SO_0001461]

A binding site that, in the enhancer region of a nucleotide molecule, interacts selectively and non-covalently with polypeptide residues.

enhancer_bound_by_factor [SO_0000166]

An enhancer bound by a factor.

enhancer_trap_construct [SO_0001479]

An enhancer trap construct is a type of engineered plasmid which is designed to integrate into a genome and express a reporter when the expression from a basic minimal promoter is enhanced by genomic enhancer elements. Enhancer traps contain promoter elements and are not usually mutagenic.

enhancerRNA [SO_0001870]

A short ncRNA that is transcribed from an enhancer. May have a regulatory function.

enzymatic [SO_0001185]

An attribute describing the sequence of a transcript that has catalytic activity with or without an associated ribonucleoprotein. Do not use this for feature annotation. Use enzymatic_RNA (SO:0000372) instead.

enzymatic_RNA [SO_0000372]

An RNA sequence that has catalytic activity with or without an associated ribonucleoprotein. This was moved to be a child of transcript (SO:0000673) because some enzymatic RNA regions are part of primary transcripts and some are part of processed transcripts.

enzymatic_RNA_gene [SO_0002180]

A gene that encodes an enzymatic RNA.

epigenetically_modified [SO_0000133]

This attribute describes a gene where heritable changes other than those in the DNA sequence occur. These changes include: modification to the DNA (such as DNA methylation, the covalent modification of cytosine), and post-translational modification of histones.

epigenetically_modified_gene [SO_0000898]

A gene that is epigenetically modified.

epigenetically_modified_region [SO_0001720]

A biological DNA region implicated in epigenomic changes caused by mechanisms other than changes in the underlying DNA sequence. This includes, nucleosomal histone post-translational modifications, nucleosome depletion to render DNA accessible and post-replicational base modifications such as cytosine modification. Moved from is_a biological_region (SO:0001411) to is_a regulatory_region (SO:0005836) on 11 Feb 2021. GREEKC members pointed out that this would be a more appropriate location. See GitHub Issue #530. 11 Feb 2021 updated definition along with addition of epigenomically_modified_region (SO:0002332). Epigenetically modified region is now not inherited while epigenomically modified region is not annotated as inherited. See GitHub Issue #532 and issue #534.

episome [SO_0000768]

A plasmid that may integrate with a chromosome.

epoxyqueuosine [SO_0001318]

Epoxyqueuosine is a modified 7-deazoguanosine.

ER_retention_signal [SO_0001806]

A C-terminal tetrapeptide motif that mediates retention of a protein in (or retrieval to) the endoplasmic reticulum. In mammals the sequence is KDEL, and in fungi HDEL or DDEL.

EST [SO_0000345]

A tag produced from a single sequencing read from a cDNA clone or PCR product; typically a few hundred base pairs long. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

EST_match [SO_0000668]

A match against an EST sequence.

eukaryotic_promoter [SO_0002221]

A regulatory_region including the Transcription Start Site (TSS) of a gene and serving as a platform for Pre-Initiation Complex (PIC) assembly, enabling transcription of a gene under certain conditions.

eukaryotic_terminator [SO_0000951]

A signal for RNA polymerase to terminate transcription.

exemplar [SO_0000864]

An attribute describing a sequence is representative of a class of similar sequences.

exemplar_mRNA [SO_0000734]

An exemplar is a representative cDNA sequence for each gene. The exemplar approach is a method that usually involves some initial clustering into gene groups and the subsequent selection of a representative from each gene group. Added for the MO people.

exon_junction [SO_0000333]

The boundary between two exons in a processed transcript.

exon_loss_variant [SO_0001572]

A sequence variant whereby an exon is lost from the transcript.

exon_of_single_exon_gene [SO_0005845]

An exon that is the only exon in a gene.

exon_region [SO_0000852]

A region of an exon.

exon_variant [SO_0001791]

A sequence variant that changes exon sequence.

exonic_splice_enhancer [SO_0000683]

Exonic splicing enhancers (ESEs) facilitate exon definition by assisting in the recruitment of splicing factors to the adjacent intron.

exonic_splice_region_variant [SO_0002084]

A sequence variant in which a change has occurred within the exonic region of the splice site, 1-2 bases from boundary.

exonic_splicing_silencer [SO_0002058]

An exonic splicing regulatory element that functions to recruit trans acting splicing factors suppress the transcription of the gene or genes they control. Requested by Javier Diez Perez.

experimental_feature [SO_0001410]

A region which is the result of some arbitrary experimental procedure. The procedure may be carried out with biological material or inside a computer.

experimental_feature_attribute [SO_0001684]

An attribute of an experimentally derived feature.

experimental_result_region [SO_0000703]

A region of sequence implicated in an experimental result.

experimentally_defined_binding_region [SO_0001696]

A region that has been implicated in binding although the exact coordinates of binding may be unknown.

experimentally_determined [SO_0000312]

Attribute to describe a feature that has been experimentally verified.

expressed_sequence_assembly [SO_0001428]

A sequence assembly derived from expressed sequences. From tracker [ 2372385 ] expressed_sequence_assembly.

expressed_sequence_match [SO_0000102]

A match to an EST or cDNA sequence.

extended_cis_splice_site [SO_0001993]

Intronic positions associated with cis-splicing. Contains the first and second positions immediately before the exon and the first, second and fifth positions immediately after. Added by Andy Menzies (Sanger).

extended_intronic_splice_region [SO_0001996]

Region of intronic sequence within 10 bases of an exon.

extended_intronic_splice_region_variant [SO_0001995]

A sequence variant occurring in the intron, within 10 bases of exon. Added by Andy Menzies (Sanger).

external_transcribed_spacer_region [SO_0000640]

Non-coding regions of DNA that precede the sequence that codes for the ribosomal RNA.

extrachromosomal_mobile_genetic_element [SO_0001038]

An MGE that is not integrated into the host chromosome.

extramembrane_polypeptide_region [SO_0001072]

Polypeptide region that is localized outside of a lipid bilayer. Range.

feature_ablation [SO_0001879]

A sequence variant, caused by an alteration of the genomic sequence, where the deletion, is greater than the extent of the underlying genomic features. Created in conjunction with the EBI.

feature_amplification [SO_0001880]

A sequence variant, caused by an alteration of the genomic sequence, where the structural change, an amplification of sequence, is greater than the extent of the underlying genomic features. Created in conjunction with the EBI.

feature_attribute [SO_0000733]

An attribute describing a located_sequence_feature.

feature_elongation [SO_0001907]

A sequence variant that causes the extension of a genomic feature, with regard to the reference sequence.

feature_fusion [SO_0001882]

A sequence variant, caused by an alteration of the genomic sequence, where a deletion fuses genomic features. Created in conjunction with the EBI.

feature_translocation [SO_0001881]

A sequence variant, caused by an alteration of the genomic sequence, where the structural change, a translocation, is greater than the extent of the underlying genomic features. Created in conjunction with the EBI.

feature_truncation [SO_0001906]

A sequence variant that causes the reduction of a genomic feature, with regard to the reference sequence.

feature_variant [SO_0001878]

A sequence variant that falls entirely or partially within a genomic feature. Created in conjunction with the EBI.

fingerprint_map [SO_0001250]

A fingerprint_map is a physical map composed of restriction fragments.

finished_genome [SO_0001491]

The status of a whole genome sequence, with less than 1 error per 100,000 base pairs.

five_aminomethyl_two_thiouridine [SO_0001363]

5_aminomethyl_2_thiouridine is a modified uridine base feature.

five_carbamoylmethyl_two_prime_O_methyluridine [SO_0001368]

5_carbamoylmethyl_2_prime_O_methyluridine is a modified uridine base feature.

five_carbamoylmethyluridine [SO_0001367]

5_carbamoylmethyluridine is a modified uridine base feature.

five_carboxyhydroxymethyl_uridine [SO_0001358]

5_carboxyhydroxymethyl_uridine is a modified uridine base feature.

five_carboxyhydroxymethyl_uridine_methyl_ester [SO_0001359]

5_carboxyhydroxymethyl_uridine_methyl_ester is a modified uridine base feature.

five_carboxymethylaminomethyl_two_prime_O_methyluridine [SO_0001370]

5_carboxymethylaminomethyl_2_prime_O_methyluridine is a modified uridine base feature.

five_carboxymethylaminomethyl_two_thiouridine [SO_0001371]

5_carboxymethylaminomethyl_2_thiouridine is a modified uridine base feature.

five_carboxymethylaminomethyluridine [SO_0001369]

5_carboxymethylaminomethyluridine is a modified uridine base feature.

five_carboxymethyluridine [SO_0001374]

5_carboxymethyluridine is a modified uridine base feature.

five_formyl_two_prime_O_methylcytidine [SO_0001293]

5-formyl-2’-O-methylcytidine is a modified cytidine.

five_formylcytidine [SO_0001286]

5-formylcytidine is a modified cytidine.

five_hydroxymethylcytidine [SO_0001292]

5-hydroxymethylcytidine is a modified cytidine.

five_hydroxyuridine [SO_0001354]

5_hydroxyuridine is a modified uridine base feature.

five_isopentenylaminomethyl_two_prime_O_methyluridine [SO_0001382]

5_isopentenylaminomethyl_2prime_O_methyluridine is a modified uridine base feature.

five_isopentenylaminomethyl_two_thiouridine [SO_0001381]

5_isopentenylaminomethyl_2_thiouridine is a modified uridine base feature.

five_isopentenylaminomethyl_uridine [SO_0001380]

5_isopentenylaminomethyl_uridine is a modified uridine base feature.

five_methoxycarbonylmethyl_two_prime_O_methyluridine [SO_0001361]

Five_methoxycarbonylmethyl_2_prime_O_methyluridine is a modified uridine base feature.

five_methoxycarbonylmethyl_two_thiouridine [SO_0001362]

5_methoxycarbonylmethyl_2_thiouridine is a modified uridine base feature.

five_methoxycarbonylmethyluridine [SO_0001360]

Five_methoxycarbonylmethyluridine is a modified uridine base feature.

five_methoxyuridine [SO_0001355]

5_methoxyuridine is a modified uridine base feature.

five_methyl_2_thiouridine [SO_0001351]

5_methyl_2_thiouridine is a modified uridine base feature.

five_methylaminomethyl_two_selenouridine [SO_0001366]

5_methylaminomethyl_2_selenouridine is a modified uridine base feature.

five_methylaminomethyl_two_thiouridine [SO_0001365]

5_methylaminomethyl_2_thiouridine is a modified uridine base feature.

five_methylaminomethyluridine [SO_0001364]

5_methylaminomethyluridine is a modified uridine base feature.

five_methylcytidine [SO_0001282]

5-methylcytidine is a modified cytidine.

five_methyldihydrouridine [SO_0001376]

5_methyldihydrouridine is a modified uridine base feature.

five_methyluridine [SO_0001344]

5_methyluridine is a modified uridine base feature.

five_prime_cis_splice_site [SO_0000163]

Intronic 2 bp region bordering the exon, at the 5’ edge of the intron. A splice_site that is downstream_adjacent_to exon and starts intron.

five_prime_clip [SO_0000555]

5’ most region of a precursor transcript that is clipped off during processing.

five_prime_coding_exon [SO_0000200]

The 5’ most coding exon.

five_prime_coding_exon_coding_region [SO_0000196]

The sequence of the five_prime_coding_exon that codes for protein.

five_prime_coding_exon_noncoding_region [SO_0000486]

The sequence of the 5’ exon preceding the start codon.

five_prime_D_heptamer [SO_0000496]

7 nucleotide recombination site (e.g. CACTGTG), part of a 5’ D-recombination signal sequence (SO:0000556) of an immunoglobulin/T-cell receptor gene.

five_prime_D_nonamer [SO_0000497]

9 nucleotide recombination site (e.g. GGTTTTTGT), part of a five_prime_D-recombination signal sequence (SO:0000556) of an immunoglobulin/T-cell receptor gene.

five_prime_D_recombination_signal_sequence [SO_0000556]

Recombination signal of an immunoglobulin/T-cell receptor gene, including the 5’ D-nonamer (SO:0000497), 5’ D-spacer (SO:0000498), and 5’ D-heptamer (SO:0000396) in 5’ of the D-region of a D-gene, or in 5’ of the D-region of DJ-gene.

five_prime_D_spacer [SO_0000498]

12 or 23 nucleotide spacer between the 5’ D-heptamer (SO:0000496) and 5’ D-nonamer (SO:0000497) of a 5’ D-recombination signal sequence (SO:0000556) of an immunoglobulin/T-cell receptor gene.

five_prime_EST [SO_0001208]

An EST read from the 5’ end of a transcript that usually codes for a protein. These regions tend to be conserved across species and do not change much within a gene family.

five_prime_five_prime_overlap [SO_0000074]

An attribute to describe a gene when the five prime region overlaps with another gene’s five prime region.

five_prime_flanking_region [SO_0001416]

A flanking region located five prime of a specific region.

five_prime_intron [SO_0000190]

An intron that is the most 5-prime in a given transcript.

five_prime_LTR [SO_0000425]

The long terminal repeat found at the five-prime end of the sequence to be inserted into the host genome.

five_prime_LTR_component [SO_0000850]

A component of the five-prime long terminal repeat.

five_prime_noncoding_exon [SO_0000445]

Non-coding exon in the 5’ UTR.

five_prime_open_reading_frame [SO_0000629]

An open reading frame found within the 5’ UTR that can be translated and stall the translation of the downstream open reading frame.

five_prime_recoding_site [SO_1001280]

The recoding stimulatory signal located upstream of the recoding site.

five_prime_restriction_enzyme_junction [SO_0001689]

The restriction enzyme cleavage junction on the 5’ strand of the nucleotide sequence.

five_prime_RST [SO_0001469]

A tag produced from a single sequencing read from a 5’-RACE product; typically a few hundred base pairs long.

five_prime_sticky_end_restriction_enzyme_cleavage_site [SO_0001975]

A restriction enzyme recognition site that, when cleaved, results in 5 prime overhangs. Requested by Jackie Quinn. The sticky restriction sites are different from junctions because they include the sequence that is cut, inclusive of the five prime junction and the three prime junction.

five_prime_terminal_inverted_repeat [SO_0000420]

An inverted repeat (SO:0000294) occurring at the 5-prime termini of a DNA transposon.

five_prime_three_prime_overlap [SO_0000073]

An attribute to describe a gene when the five prime region overlaps with another gene’s 3’ region.

five_prime_UST [SO_0001466]

An UST located in the 5’UTR of a protein-coding transcript.

five_prime_UTR_intron [SO_0000447]

An intron located in the 5’ UTR.

five_prime_UTR_premature_start_codon_location_variant [SO_0001990]

A 5’ UTR variant where a premature start codon is moved.

five_taurinomethyl_two_thiouridine [SO_0001379]

5_taurinomethyl_2_thiouridineis a modified uridine base feature.

five_taurinomethyluridine [SO_0001378]

5_taurinomethyluridine is a modified uridine base feature.

five_two_prime_O_dimethylcytidine [SO_0001287]

5,2’-O-dimethylcytidine is a modified cytidine.

five_two_prime_O_dimethyluridine [SO_0001346]

5_2_prime_O_dimethyluridine is a modified uridine base feature.

fixed_variant [SO_0001768]

When a variant has become fixed in the population so that it is now the only variant.

flanked [SO_0000357]

An attribute describing a region that is bounded either side by a particular kind of region.

flanking_region [SO_0000239]

The sequences extending on either side of a specific region.

flanking_three_prime_quadruplet_recoding_signal [SO_1001281]

Four base pair sequence immediately downstream of the redefined region. The redefined region is a frameshift site. The quadruplet is 2 overlapping codons.

FLEX_element [SO_0001846]

A promoter element that has consensus sequence GTAAACAAACAAAM and contains a heptameric core GTAAACA, bound by transcription factors with a forkhead DNA-binding domain.

floxed_gene [SO_0000363]

A transgene that is floxed.

foldback_element [SO_0000238]

A transposable element with extensive secondary structure, characterized by large modular imperfect long inverted repeats.

foreign [SO_0000784]

An attribute to describe a region from another species.

foreign_gene [SO_0000285]

A gene that is foreign.

foreign_transposable_element [SO_0000720]

A transposable element that is foreign. requested by Michael on 19 Nov 2004.

forkhead_motif [SO_0001847]

A promoter element with consensus sequence TTTRTTTACA, bound by transcription factors with a forkhead DNA-binding domain.

forward [SO_0001030]

Forward is an attribute of the feature, where the feature is in the 5’ to 3’ direction.

forward_primer [SO_0000121]

A single stranded oligo used for polymerase chain reaction. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

fosmid_clone [SO_0000763]

[fosmid_clone]

four_bp_start_codon [SO_1001269]

A non-canonical start codon with 4 base pairs.

four_cutter_restriction_site [SO_0000244]

[4-cutter_restriction_site; four_cutter_restriction_site; four-cutter_restriction_sit]

four_demethylwyosine [SO_0001341]

4_demethylwyosine is a modified guanosine base feature.

four_thiouridine [SO_0001350]

4_thiouridine is a modified uridine base feature.

fragment_assembly [SO_0001249]

A fragment assembly is a genome assembly that orders overlapping fragments of the genome based on landmark sequences. The base pair distance between the landmarks is known allowing additivity of lengths.

fragmentary [SO_0000731]

An attribute to describe a feature that is incomplete. Term added because of request by MO people.

frame_restoring_sequence_variant [SO_1000110]

A mutation that reverts the sequence of a previous frameshift mutation back to the initial frame.

frame_restoring_variant [SO_0001591]

A sequence variant that reverts the sequence of a previous frameshift mutation back to the initial frame.

frameshift [SO_0000865]

An attribute describing a sequence that contains a mutation involving the deletion or insertion of one or more bases, where this number is not divisible by 3.

Frameshift [SO_0001589]

A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three. EBI term:Frameshift variations - In coding sequence, resulting in a frameshift.

frameshift_elongation [SO_0001909]

A frameshift variant that causes the translational reading frame to be extended relative to the reference feature.

frameshift_sequence_variation [SO_1000065]

A mutation causing a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three.

frameshift_truncation [SO_0001910]

A frameshift variant that causes the translational reading frame to be shortened relative to the reference feature.

FRE [SO_0002046]

A FRE is an enhancer element necessary and sufficient to confer filamentation associated expression in S. cerevisiae. Requested by Rama, SGD.

free [SO_0001516]

The quality of a duplication where the new region exists independently of the original.

free_chromosome_arm [SO_0000065]

A chromosome structure variation whereby an arm exists as an individual chromosome element.

free_duplication [SO_1000144]

A chromosome structure variation whereby the duplicated sequences are carried as a free centric element.

free_ring_duplication [SO_1000145]

A ring chromosome which is a copy of another chromosome.

FRT_flanked [SO_0000361]

An attribute to describe sequence that is flanked by the FLP recombinase recognition site, FRT.

FRT_site [SO_0000350]

An inversion site found on the Saccharomyces cerevisiae 2 micron plasmid.

functional_candidate_gene [SO_0001869]

A candidate gene whose function has something in common biologically with the trait under investigation. Requested by Bayer Cropscience December, 2011.

functional_variant [SO_0001536]

A variant whereby the effect is evaluated with respect to a reference. Updated after request from Lea Starita, lea.starita@gmail.com from the NCBI.

functionally_abnormal [SO_0002218]

A sequence variant in which the function of a gene product is altered with respect to a reference. Added after request from Lea Starita, lea.starita@gmail.com from the NCBI Feb 2019.

fusion [SO_0000806]

When two regions of DNA are joined together that are not normally together.

G_box [SO_0001980]

A regulatory promoter element identified in mutation experiments, with consensus sequence: CACGTG. Present in promoters, intergenic regions, coding regions, and introns. They are involved in gene expression responses to light and interact with G-box binding factor and I-box binding factor 1a. A plant specific region.

G_to_A_transition [SO_1000016]

A transition of a guanine to an adenine.

G_to_C_transversion [SO_1000026]

A transversion from guanine to cytidine.

G_to_T_transversion [SO_1000027]

A transversion from guanine to thymine.

GAGA_motif [SO_0001166]

A non directional promoter motif with consensus sequence GAGAGCG.

gain_of_function_variant [SO_0002053]

A sequence variant whereby new or enhanced function is conferred on the gene product.

galactosyl_queuosine [SO_0001319]

Galactosyl_queuosine is a modified 7-deazoguanosine.

gamma_turn [SO_0001138]

Gamma turns, defined for 3 residues i,( i+1),( i+2) if a hydrogen bond exists between residues i and i+2 and the phi and psi angles of residue i+1 fall within 40 degrees.

gamma_turn_classic [SO_0001139]

Gamma turns, defined for 3 residues i, i+1, i+2 if a hydrogen bond exists between residues i and i+2 and the phi and psi angles of residue i+1 fall within 40 degrees: phi(i+1)=75.0 - psi(i+1)=-64.0.

gamma_turn_inverse [SO_0001140]

Gamma turns, defined for 3 residues i, i+1, i+2 if a hydrogen bond exists between residues i and i+2 and the phi and psi angles of residue i+1 fall within 40 degrees: phi(i+1)=-79.0 - psi(i+1)=69.0.

gap [SO_0000730]

A gap in the sequence of known length. The unknown bases are filled in with N’s.

GATA_box [SO_0001840]

A GATA transcription factor element containing the consensus sequence WGATAR (in which W indicates A/T and R indicates A/G). Changed to is_a SO:0001055 transcriptional_cis_regulatory_region from core_eukaryotic_promoter_element SO:0001660 after Ruth Lovering from GREEKC initiative pointed out that GATA boxes are frequently in enhancer regions, Dave Sant Aug 2020. Moved from is_a SO:0001055 transcriptional_cis_regulatory_region to SO:0000235 TF_binding_site after Colin Logie pointed out that this is a consensus sequence where transcription factors bind, GREEKC Jan 21, 2021.

GC_rich_promoter_region [SO_0000173]

A conserved GC-rich region located upstream of the start point of eukaryotic transcription units which may occur in multiple copies or in either orientation; consensus=GGGCGG.

gene [SO_0000704]

A region (or regions) that includes all of the sequence elements necessary to encode a functional transcript. A gene may include regulatory regions, transcribed regions and/or other functional sequence regions. This term is mapped to MGED. Do not obsolete without consulting MGED ontology. A gene may be considered as a unit of inheritance. A gene is any ‘gene allele’ that produces a functional transcript (ie one capable of translation into a protein, or independent functioning as an RNA), when encoded in the genome of some cell or virion.

gene_array [SO_0005851]

An array includes two or more genes, or two or more gene subarrays, contiguously arranged where the individual genes, or subarrays, are either identical in sequence, or essentially so. This would include, for example, a cluster of genes each encoding the major ribosomal RNAs and a cluster of histone gene subarrays.

gene_array_member [SO_0000081]

[gene_attribute; gene array member; gene_array_member]

gene_attribute [SO_0000401]

An attribute describing a gene.

gene_by_genome_location [SO_0000085]

[gene_by_genome_location]

gene_by_organelle_of_genome [SO_0000086]

[gene_by_organelle_of_genome]

gene_by_polyadenylation_attribute [SO_0000066]

[gene_by_polyadenylation_attribute]

gene_by_transcript_attribute [SO_0000064]

This classes of attributes was added by MA to allow the broad description of genes based on qualities of the transcript(s). A product of SO meeting 2004. [gene_by_transcript_attribute]

gene_cassette_array [SO_0005854]

An array of non-functional genes whose members, when captured by recombination form functional genes. This would include, for example, the arrays of non-functional VSG genes of Trypanosomes.

gene_cassette_member [SO_0005848]

A gene that is a member of a gene cassette, which is a mobile genetic element.

gene_class [SO_0000009]

[gene_class]

gene_component_region [SO_0000842]

A region of a gene that has a specific function.

gene_fragment [SO_0000997]

A portion of a gene that is not the complete gene. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

gene_fusion [SO_0001565]

A sequence variant whereby a two genes have become joined.

gene_group [SO_0005855]

A collection of related genes.

gene_group_regulatory_region [SO_0000752]

A region that is involved in the regulation of transcription of a group of regulated genes. Merged into transcriptional_cis_regulatory_region (SO:0001055) on 11 Feb 2021 as part of GREEKC reducing redundancy as we prepare to submit several terms to Ensembl. See GitHub Issue #529.

gene_member_region [SO_0000831]

A region of a gene. A manufactured term used to allow the parts of a gene to have an is_a path to the root.

gene_not_polyadenylated [SO_0000438]

[gene_not_polyadenylated]

gene_part [SO_0000050]

A part of a gene, that has no other route in the ontology back to region. This concept is necessary for logical inference as these parts must have the properties of region. It also allows us to associate all the parts of genes with a gene.

gene_rearranged_at_DNA_level [SO_0000138]

An epigenetically modified gene, rearranged at the DNA level.

gene_segment [SO_3000000]

A gene component region which acts as a recombinational unit of a gene whose functional form is generated through somatic recombination. Requested by tracker 2021594, July 2008, by Alex.

gene_sensu_your_favorite_organism [SO_0000008]

[gene_sensu_your_favorite_organism]

gene_silenced_by_DNA_methylation [SO_0000129]

A gene that is silenced by DNA methylation.

gene_silenced_by_DNA_modification [SO_0000128]

A gene that is silenced by DNA modification.

gene_silenced_by_histone_deacetylation [SO_0001227]

A gene that is silenced by histone deacetylation.

gene_silenced_by_histone_methylation [SO_0001226]

A gene that is silenced by histone methylation.

gene_silenced_by_histone_modification [SO_0001225]

A gene that is silenced by histone modification.

gene_silenced_by_RNA_interference [SO_0001224]

A gene that is silenced by RNA interference.

gene_subarray [SO_0005852]

A subarray is, by defintition, a member of a gene array (SO:0005851); the members of a subarray may differ substantially in sequence, but are closely related in function. This would include, for example, a cluster of genes encoding different histones.

gene_subarray_member [SO_0005849]

A gene that is a member of a group of genes that are either regulated or transcribed together within a larger group of genes that are regulated or transcribed together.

gene_to_gene_feature [SO_0000067]

[gene_to_gene_feature; gene to gene feature; gene_attribute]

gene_trap_construct [SO_0001477]

A construct which is designed to integrate into a genome and produce a fusion transcript between exons of the gene into which it inserts and a reporter element in the construct. Gene traps contain a splice acceptor, do not contain promoter elements for the reporter, and are mutagenic. Gene traps may be bicistronic with the second cassette containing a promoter driving an a selectable marker.

gene_variant [SO_0001564]

A sequence variant where the structure of the gene is changed.

gene_with_dicistronic_mRNA [SO_0000722]

A gene that encodes a polycistronic mRNA. Requested by MA nov 19 2004.

gene_with_dicistronic_primary_transcript [SO_0000721]

A gene that encodes a dicistronic primary transcript. Requested by Michael, 19 nov 2004.

gene_with_dicistronic_transcript [SO_0000692]

A gene that encodes a dicistronic transcript.

gene_with_edited_transcript [SO_0000548]

A gene that encodes a transcript that is edited.

gene_with_mRNA_recoded_by_translational_bypass [SO_0000711]

A gene with mRNA recoded by translational bypass.

gene_with_mRNA_with_frameshift [SO_0000455]

A gene that encodes an mRNA with a frameshift.

gene_with_non_canonical_start_codon [SO_0001739]

A gene with a start codon other than AUG. Requested by flybase, Dec 2010.

gene_with_polyadenylated_mRNA [SO_0000451]

A gene that encodes a polyadenylated mRNA.

gene_with_polycistronic_transcript [SO_0000690]

A gene that encodes a polycistronic transcript.

gene_with_recoded_mRNA [SO_0000693]

A gene that encodes an mRNA that is recoded.

gene_with_start_codon_CUG [SO_0001740]

A gene with a translational start codon of CUG. Requested by flybase, Dec 2010.

gene_with_stop_codon_read_through [SO_0000697]

A gene that encodes a transcript with stop codon readthrough.

gene_with_stop_codon_redefined_as_pyrrolysine [SO_0000698]

A gene encoding an mRNA that has the stop codon redefined as pyrrolysine.

gene_with_stop_codon_redefined_as_selenocysteine [SO_0000710]

A gene encoding an mRNA that has the stop codon redefined as selenocysteine.

gene_with_trans_spliced_transcript [SO_0000459]

A gene with a transcript that is trans-spliced.

gene_with_transcript_with_translational_frameshift [SO_0000712]

A gene encoding a transcript that has a translational frameshift.

genetic_marker [SO_0001645]

A measurable sequence feature that varies within a population.

genic_downstream_transcript_variant [SO_0002152]

A variant that falls downstream of a transcript, but within the genic region of the gene due to alternately transcribed isoforms.

genic_upstream_transcript_variant [SO_0002153]

A variant that falls upstream of a transcript, but within the genic region of the gene due to alternately transcribed isoforms.

genome [SO_0001026]

A genome is the sum of genetic material within a cell or virion. A genome is considered the complement of all heritable sequence features in a given cell or organism (chromosomal or extrachromosomal). This is typically a collection of >1 sequence molecules (e.g. chromosomes), but in some organisms (e.g. bacteria) it may be a single sequence macromolecule (e.g. a circular plasmid). For this reason ‘genome’ classifies under ‘sequence feature complement’.

genomic_clone [SO_0000040]

A clone of a DNA region of a genome.

genomic_DNA [SO_0000991]

DNA located in the genome and able to be transmitted to the offspring. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

genomic_DNA_read [SO_0001828]

A sequencer read of a genomic DNA substrate.

genomically_contaminated_cDNA_clone [SO_0000811]

A cDNA clone invalidated by genomic contamination.

germline_variant [SO_0001778]

A variant present in the embryo that is carried by every cell in the body.

glutamic_acid [SO_0001454]

A negatively charged, hydorophilic amino acid encoded by the codons GAA and GAG. A place holder for a cross product with chebi.

glutamic_acid_tRNA_primary_transcript [SO_0000216]

A primary transcript encoding glutaminyl tRNA (SO:0000260).

glutamine [SO_0001448]

A polar, hydorophilic amino acid encoded by the codons CAA and CAG. A place holder for a cross product with chebi.

glutamine_tRNA_primary_transcript [SO_0000217]

A primary transcript encoding glutamyl tRNA (SO:0000260).

glutaminyl_tRNA [SO_0000259]

A tRNA sequence that has a glutamine anticodon, and a 3’ glutamine binding region.

glutamyl_tRNA [SO_0000260]

A tRNA sequence that has a glutamic acid anticodon, and a 3’ glutamic acid binding region.

glycine [SO_0001443]

A non-polar, hydorophilic amino acid encoded by the codons GGN (GGT, GGC, GGA and GGG). A place holder for a cross product with chebi.

glycine_tRNA_primary_transcript [SO_0000218]

A primary transcript encoding glycyl tRNA (SO:0000263).

glycyl_tRNA [SO_0000261]

A tRNA sequence that has a glycine anticodon, and a 3’ glycine binding region.

GNA [SO_0001192]

An attribute describing a sequence consisting of nucleobases attached to a repeating unit made of an acyclic three-carbon propylene glycol connected to a phosphate backbone. It has two enantiomeric forms, (R)-GNA and (S)-GNA. Do not use this term for feature annotation. Use GNA_oligo (SO:0001192) instead.

golden_path [SO_0000688]

A set of subregions selected from sequence contigs which when concatenated form a nonredundant linear sequence.

golden_path_fragment [SO_0000468]

One of the pieces of sequence that make up a golden path.

gRNA_encoding [SO_0000979]

A non-protein_coding gene that encodes a guide_RNA.

gRNA_gene [SO_0001264]

A noncoding RNA that guides the insertion or deletion of uridine residues in mitochondrial mRNAs. This may also refer to synthetic RNAs used to guide DNA editing using the CRIPSR/Cas9 system.

group_1_intron_homing_endonuclease_target_region [SO_0000354]

A region of intronic nucleotide sequence targeted by a nuclease enzyme.

group_II_intron [SO_0000603]

Group II introns are found in rRNA, tRNA and mRNA of organelles in fungi, plants and protists, and also in mRNA in bacteria. They are large self-splicing ribozymes and have 6 structural domains (usually designated dI to dVI). A subset of group II introns also encode essential splicing proteins in intronic ORFs. The length of these introns can therefore be up to 3kb. Splicing occurs in almost identical fashion to nuclear pre-mRNA splicing with two transesterification steps. The 2’ hydroxyl of a bulged adenosine in domain VI attacks the 5’ splice site, followed by nucleophilic attack on the 3’ splice site by the 3’ OH of the upstream exon. Protein machinery is required for splicing in vivo, and long range intron to intron and intron-exon interactions are important for splice site positioning. Group II introns are further sub-classified into groups IIA and IIB which differ in splice site consensus, distance of bulged A from 3’ splice site, some tertiary interactions, and intronic ORF phylogeny. GO:0000373.

group_IIA_intron [SO_0000381]

A group II intron that recognizes IBS1/EBS1 and IBS2/EBS2 for the 5-prime exon and gamma/gamma-prime for the 3-prime exon.

group_IIB_intron [SO_0000382]

A group II intron that recognizes IBS1/EBS1 and IBS2/EBS2 for the 5-prime exon and IBS3/EBS3 for the 3-prime exon.

GT_dinucleotide_repeat [SO_0001862]

A dinucleotide repeat region composed of GT repeating elements. paper:PMID:16043634.

GTT_trinucleotide_repeat [SO_0001863]

A trinucleotide repeat region composed of GTT repeating elements.

guide_RNA [SO_0000602]

A short 3’-uridylated RNA that can form a duplex (except for its post-transcriptionally added oligo_U tail (SO:0000609)) with a stretch of mature edited mRNA.

guide_RNA_region [SO_0000930]

A region of guide RNA.

H_ACA_box_snoRNA [SO_0000594]

Members of the box H/ACA family contain an ACA triplet, exactly 3 nt upstream from the 3’ end and an H-box in a hinge region that links two structurally similar functional domains of the molecule. Both boxes are important for snoRNA biosynthesis and function. A few box H/ACA snoRNAs are involved in rRNA processing; most others are known or predicted to participate in selection of uridine nucleosides in rRNA to be converted to pseudouridines. Site selection is mediated by direct base pairing of the snoRNA with rRNA through one or both targeting domains.

H_ACA_box_snoRNA_encoding [SO_0000608]

snoRNA that is associated with guiding polyuridylation. It contains two short conserved sequence motifs: H box (ANANNA) and ACA (ACA).

H_ACA_box_snoRNA_primary_transcript [SO_0000596]

A primary transcript encoding a small nucleolar RNA of the box H/ACA family.

H_pseudoknot [SO_0000592]

A pseudoknot which contains two stems and at least two loops.

H2AK5_acetylation_site [SO_0001938]

A kind of histone modification site, whereby the 5th residue (a lysine), from the start of the H2A histone protein is acetylated.

H2AK9_acetylation_site [SO_0001944]

A kind of histone modification site, whereby the 9th residue (a lysine), from the start of the H2A histone protein is acetylated.

H2AZK11_acetylation_site [SO_0002147]

A kind of histone modification site, whereby the 11th residue (a lysine), from the start of the H2AZ histone protein is acetylated.

H2AZK13_acetylation_site [SO_0002148]

A kind of histone modification site, whereby the 13th residue (a lysine), from the start of the H2AZ histone protein is acetylated.

H2AZK15_acetylation_site [SO_0002149]

A kind of histone modification site, whereby the 15th residue (a lysine), from the start of the H2AZ histone protein is acetylated.

H2AZK4_acetylation_site [SO_0002145]

A kind of histone modification site, whereby the 4th residue (a lysine), from the start of the H2AZ histone protein is acetylated.

H2AZK7_acetylation_site [SO_0002146]

A kind of histone modification site, whereby the 7th residue (a lysine), from the start of the H2AZ histone protein is acetylated.

H2B_ubiquitination_site [SO_0001717]

A histone modification site on H2B where ubiquitin may be added.

H2BK12_acetylation_site [SO_0001937]

A kind of histone modification site, whereby the 12th residue (a lysine), from the start of the H2B protein is acetylated.

H2BK120_acetylation_site [SO_0001940]

A kind of histone modification site, whereby the 120th residue (a lysine), from the start of the H2B histone protein is acetylated.

H2BK15_acetylation_site [SO_0001946]

A kind of histone modification site, whereby the 15th residue (a lysine), from the start of the H2B histone protein is acetylated.

H2BK20_acetylation_site [SO_0001942]

A kind of histone modification site, whereby the 20th residue (a lysine), from the start of the H2B histone protein is acetylated.

H2BK5_monomethylation_site [SO_0001714]

A kind of histone modification site, whereby the 5th residue (a lysine), from the start of the H2B protein is methylated.

H3K14_acetylation_site [SO_0001704]

A kind of histone modification site, whereby the 14th residue (a lysine), from the start of the H3 histone protein is acetylated.

H3K18_acetylation_site [SO_0001718]

A kind of histone modification site, whereby the 18th residue (a lysine), from the start of the H3 histone protein is acetylated.

H3K20_trimethylation_site [SO_0001935]

A kind of histone modification site, whereby the 20th residue (a lysine), from the start of the H3 protein is tri-methylated.

H3K23_acetylation_site [SO_0001719]

A kind of histone modification, whereby the 23rd residue (a lysine), from the start of the H3 histone protein is acetylated.

H3K23_dimethylation_site [SO_0001951]

A kind of histone modification site, whereby the 23rd residue (a lysine), from the start of the H3 protein is di-methylated.

H3K27_acetylation_site [SO_0002049]

A kind of histone modification site, whereby the 27th residue (a lysine), from the start of the H3 histone protein is acetylated. Requested by: Sagar Jain, Richard Scheuermann.

H3K27_acylation_site [SO_0001721]

A kind of histone modification site, whereby the 27th residue (a lysine), from the start of the H3 histone protein is acylated.

H3K27_dimethylation_site [SO_0001726]

A kind of histone modification site, whereby the 27th residue (a lysine), from the start of the H3 histone protein is di-methylated.

H3K27_methylation_site [SO_0001732]

A kind of histone modification site, whereby the 27th residue (a lysine), from the start of the H3 histone protein is methylated.

H3K27_monomethylation_site [SO_0001708]

A kind of histone modification site, whereby the 27th residue (a lysine), from the start of the H3 histone protein is mono-methylated.

H3K27_trimethylation_site [SO_0001709]

A kind of histone modification site, whereby the 27th residue (a lysine), from the start of the H3 histone protein is tri-methylated.

H3K36_acetylation_site [SO_0001936]

A kind of histone modification site, whereby the 36th residue (a lysine), from the start of the H3 histone protein is acetylated.

H3K36_dimethylation_site [SO_0001723]

A kind of histone modification site, whereby the 36th residue (a lysine), from the start of the H3 histone protein is dimethylated.

H3K36_methylation_site [SO_0001733]

A kind of histone modification site, whereby the 36th residue (a lysine), from the start of the H3 histone protein is methylated.

H3K36_monomethylation_site [SO_0001722]

A kind of histone modification site, whereby the 36th residue (a lysine), from the start of the H3 histone protein is mono-methylated.

H3K36_trimethylation_site [SO_0001724]

A kind of histone modification site, whereby the 36th residue (a lysine), from the start of the H3 histone protein is tri-methylated.

H3K4_acetylation_site [SO_0001943]

A kind of histone modification site, whereby the 4th residue (a lysine), from the start of the H3 histone protein is acetylated.

H3K4_dimethylation_site [SO_0001725]

A kind of histone modification site, whereby the 4th residue (a lysine), from the start of the H3 histone protein is di-methylated.

H3K4_methylation_site [SO_0001734]

A kind of histone modification, whereby the 4th residue (a lysine), from the start of the H3 protein is methylated.

H3K4_monomethylation_site [SO_0001705]

A kind of histone modification, whereby the 4th residue (a lysine), from the start of the H3 protein is mono-methylated.

H3K4_trimethylation [SO_0001706]

A kind of histone modification site, whereby the 4th residue (a lysine), from the start of the H3 protein is tri-methylated.

H3K56_acetylation_site [SO_0001945]

A kind of histone modification site, whereby the 56th residue (a lysine), from the start of the H3 histone protein is acetylated.

H3K79_dimethylation_site [SO_0001711]

A kind of histone modification site, whereby the 79th residue (a lysine), from the start of the H3 histone protein is di-methylated.

H3K79_methylation_site [SO_0001735]

A kind of histone modification site, whereby the 79th residue (a lysine), from the start of the H3 histone protein is methylated.

H3K79_monomethylation_site [SO_0001710]

A kind of histone modification site, whereby the 79th residue (a lysine), from the start of the H3 histone protein is mono- methylated.

H3K79_trimethylation_site [SO_0001712]

A kind of histone modification site, whereby the 79th residue (a lysine), from the start of the H3 histone protein is tri-methylated.

H3K9_acetylation_site [SO_0001703]

A kind of histone modification site, whereby the 9th residue (a lysine), from the start of the H3 histone protein is acetylated.

H3K9_dimethylation_site [SO_0001728]

A kind of histone modification site, whereby the 9th residue (a lysine), from the start of the H3 histone protein may be dimethylated.

H3K9_methylation_site [SO_0001736]

A kind of histone modification site, whereby the 9th residue (a lysine), from the start of the H3 histone protein is methylated.

H3K9_monomethylation_site [SO_0001727]

A kind of histone modification site, whereby the 9th residue (a lysine), from the start of the H3 histone protein is mono-methylated.

H3K9_trimethylation_site [SO_0001707]

A kind of histone modification site, whereby the 9th residue (a lysine), from the start of the H3 histone protein is tri-methylated.

H3R2_dimethylation_site [SO_0001948]

A kind of histone modification site, whereby the 2nd residue (an arginine), from the start of the H3 protein is di-methylated.

H3R2_monomethylation_site [SO_0001947]

A kind of histone modification site, whereby the 2nd residue (an arginine), from the start of the H3 protein is mono-methylated.

H4K_acylation_region [SO_0001738]

A region of the H4 histone whereby multiple lysines are acylated.

H4K12_acetylation_site [SO_0001939]

A kind of histone modification site, whereby the 12th residue (a lysine), from the start of the H4 histone protein is acetylated.

H4K16_acetylation_site [SO_0001729]

A kind of histone modification site, whereby the 16th residue (a lysine), from the start of the H4 histone protein is acetylated.

H4K20_monomethylation_site [SO_0001713]

A kind of histone modification site, whereby the 20th residue (a lysine), from the start of the H4histone protein is mono-methylated.

H4K4_trimethylation_site [SO_0001950]

A kind of histone modification site, whereby the 4th residue (a lysine), from the start of the H4 protein is tri-methylated.

H4K5_acetylation_site [SO_0001730]

A kind of histone modification site, whereby the 5th residue (a lysine), from the start of the H4 histone protein is acetylated.

H4K8_acetylation_site [SO_0001731]

A kind of histone modification site, whereby the 8th residue (a lysine), from the start of the H4 histone protein is acetylated.

H4K91_acetylation_site [SO_0001941]

A kind of histone modification site, whereby the 91st residue (a lysine), from the start of the H4 histone protein is acetylated.

H4R3_dimethylation_site [SO_0001949]

A kind of histone modification site, whereby the 3nd residue (an arginine), from the start of the H4 protein is di-methylated.

hammerhead_ribozyme [SO_0000380]

A small catalytic RNA motif that catalyzes self-cleavage reaction. Its name comes from its secondary structure which resembles a carpenter’s hammer. The hammerhead ribozyme is involved in the replication of some viroid and some satellite RNAs.

haplotype [SO_0001024]

A haplotype is one of a set of coexisting sequence variants of a haplotype block.

haplotype_block [SO_0000355]

A region of the genome which is co-inherited as the result of the lack of historic recombination within it.

helix_turn_helix [SO_0001081]

A motif comprising two helices separated by a turn.

heptamer_of_recombination_feature_of_vertebrate_immune_system_gene [SO_0000561]

Seven nucleotide recombination site (e.g. CACAGTG), part of V-gene, D-gene or J-gene recombination feature of an immunoglobulin or T-cell receptor gene.

heritable_phenotypic_marker [SO_0001500]

A biological_region characterized as a single heritable trait in a phenotype screen. The heritable phenotype may be mapped to a chromosome but generally has not been characterized to a specific gene locus.

HERV_deletion [SO_0002067]

A deletion of the HERV mobile element with respect to a reference.

hetero_compound_chromosome [SO_1000140]

A compound chromosome whereby two arms from different chromosomes are connected through the centromere of one of them.

high_identity_region [SO_0001502]

An experimental feature with high sequence identity to another sequence. Requested by tracker ID: 2902685.

high_quality_draft [SO_0001487]

The status of a whole genome sequence, where overall coverage represents at least 90 percent of the genome.

histidine [SO_0001452]

A positively charged, hydorophilic amino acid encoded by the codons CAT and CAC. A place holder for a cross product with chebi.

histidine_tRNA_primary_transcript [SO_0000219]

A primary transcript encoding histidyl tRNA (SO:0000262).

histidyl_tRNA [SO_0000262]

A tRNA sequence that has a histidine anticodon, and a 3’ histidine binding region.

histone_2A_acetylation_site [SO_0002142]

A histone 2A modification where the modification is the acetylation of the residue.

histone_2AZ_acetylation_site [SO_0002144]

A histone 2AZ modification where the modification is the acetylation of the residue.

histone_2B_acetylation_site [SO_0002143]

A histone 2B modification where the modification is the acetylation of the residue.

histone_3_acetylation_site [SO_0001973]

A histone 3 modification where the modification is the acetylation of the residue.

histone_4_acetylation_site [SO_0001972]

A histone 4 modification where the modification is the acetylation of the residue.

histone_acetylation_site [SO_0001702]

A histone modification where the modification is the acylation of the residue.

histone_acylation_region [SO_0001737]

A histone modification, whereby the histone protein is acylated at multiple sites in a region.

histone_binding_site [SO_0001383]

A binding site that, in the nucleotide molecule, interacts selectively and non-covalently with polypeptide residues of a histone.

histone_methylation_site [SO_0001701]

A histone modification site where the modification is the methylation of the residue.

histone_modification [SO_0001700]

Histone modification is a post translationally modified region whereby residues of the histone protein are modified by methylation, acetylation, phosphorylation, ubiquitination, sumoylation, citrullination, or ADP-ribosylation.

histone_ubiqitination_site [SO_0001716]

A histone modification site where ubiquitin may be added.

homing_endonuclease_binding_site [SO_0001257]

The binding site (recognition site) of a homing endonuclease. The binding site is typically large.

homo_compound_chromosome [SO_1000138]

A compound chromosome whereby two copies of the same chromosomal arm attached to a common centromere. The chromosome is diploid for the arm involved.

homol_D_box [SO_0001848]

A core promoter element that has the consensus sequence CAGTCACA (or its inverted form TGTGACTG), and plays the role of a TATA box in promoters that do not contain a canonical TATA sequence.

homol_E_box [SO_0001849]

A core promoter element that has the consensus sequence ACCCTACCCT (or its inverted form AGGGTAGGGT), and is found near the homol D box in some promoters that use a homol D box instead of a canonical TATA sequence.

homologous [SO_0000857]

Similarity due to common ancestry.

homologous_region [SO_0000853]

A region that is homologous to another region.

HSE [SO_0001850]

A promoter element that consists of at least three copies of the pentanucleotide NGAAN, bound by the heat shock transcription factor HSF.

hydrophobic_region_of_peptide [SO_0100013]

Hydrophobic regions are regions with a low affinity for water. Range.

hydroxywybutosine [SO_0001334]

Hydroxywybutosine is a modified guanosine base feature.

hypoploid [SO_0000056]

A kind of chromosome variation where the chromosome complement is not an exact multiple of the haploid number as some chromosomes are missing.

i_motif [SO_0001010]

A cytosine rich domain whereby strands associate both inter- and intramolecularly at moderately acidic pH.

I-box [SO_0001982]

A plant regulatory promoter motif, composed of a highly conserved hexamer GATAAG (I-box core).

iDNA [SO_0000723]

Genomic sequence removed from the genome, as a normal event, by a process of recombination.

IG_C_gene [SO_0002123]

A constant (C) gene, a gene that codes the constant region of an immunoglobulin chain. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

IG_C_pseudogene [SO_0002100]

A pseudogenic constant region of an immunoglobulin gene which closely resembles a known functional Imunoglobulin constant gene but in which the coding region has stop codons, frameshift mutations or a mutation that effects the initiation codon. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

IG_D_gene [SO_0002124]

A gene that rearranges at the DNA level and codes the diversity region of the variable domain of an immunoglobuin (IG) gene. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

IG_J_gene [SO_0002125]

A joining gene that rearranges at the DNA level and codes the joining region of the variable domain of an immunoglobulin chain. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

IG_J_pseudogene [SO_0002101]

A pseudogenic joining region which closely resembles a known functional imunoglobulin joining gene but in which the coding region has stop codons, frameshift mutations or a mutation that effects the initiation codon that rearranges at the DNA level and codes the joining region of the variable domain of an immunoglobulin chain. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

IG_V_gene [SO_0002126]

A variable gene that rearranges at the DNA level and codes the variable region of the variable domain of an Immunoglobulin chain. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

IG_V_pseudogene [SO_0002102]

A pseudogenic variable region which closely resembles a known functional imunoglobulin variable gene but in which the coding region has stop codons, frameshift mutations or a mutation that effects the initiation codon that rearranges at the DNA level and codes the variable region of an immunoglobulin chain. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

immature_peptide_region [SO_0001063]

An immature_peptide_region is the extent of the peptide after it has been translated and before any processing occurs. Range.

immunoglobulin_gene [SO_0002122]

A germline immunoglobulin gene. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

immunoglobulin_pseudogene [SO_0002098]

A pseudogene derived from an immunoglobulin gene. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes. The terms are defined according to IGMT.

immunoglobulin_region [SO_0001832]

A region of immunoglobulin sequence, either constant or variable.

improved_high_quality_draft [SO_0001488]

The status of a whole genome sequence, where additional work has been performed, using either manual or automated methods, such as gap resolution.

inactive_catalytic_site [SO_0001618]

A sequence variant that causes the inactivation of a catalytic site with respect to a reference sequence.

inactive_ligand_binding_site [SO_0001560]

A sequence variant that causes the inactivation of a ligand binding site with respect to a reference sequence.

incomplete_terminal_codon_variant [SO_0001626]

A sequence variant where at least one base of the final codon of an incompletely annotated transcript is changed. EBI term: Partial codon - Located within the final, incomplete codon of a transcript with a shortened coding sequence where the end is unknown.

incomplete_transcript_3UTR_variant [SO_0002076]

A sequence variant that intersects the 3’ UTR of an incompletely annotated transcript.

incomplete_transcript_5UTR_variant [SO_0002077]

A sequence variant that intersects the 5’ UTR of an incompletely annotated transcript.

incomplete_transcript_CDS [SO_0002081]

A sequence variant that intersects the coding regions of an incompletely annotated transcript.

incomplete_transcript_coding_splice_variant [SO_0002082]

A sequence variant that intersects the coding sequence near a splice region of an incompletely annotated transcript.

incomplete_transcript_exonic_variant [SO_0002080]

A sequence variant that intersects the exon of an incompletely annotated transcript.

incomplete_transcript_intronic_variant [SO_0002078]

A sequence variant that intersects the intron of an incompletely annotated transcript.

incomplete_transcript_splice_region_variant [SO_0002079]

A sequence variant that intersects the splice region of an incompletely annotated transcript.

incomplete_transcript_variant [SO_0002075]

A sequence variant that intersects an incompletely annotated transcript. This term is to map to the ANNOVAR term ’ncRNA’ http://annovar.openbioinformatics.org/en/latest/user-guide/gene/ . The description in the documentation (11/23/15) ‘variant overlaps a transcript without coding annotation in the gene definition’. and this is further clarified in the document: ncRNA above refers to RNA without coding annotation. It does not mean that this is a RNA that will never be translated; it merely means that the user-selected gene annotation system was not able to give a coding sequence annotation. It could still code protein products and may have such annotations in future versions of gene annotation or in another gene annotation system. For example, BC039000 is regarded as ncRNA by ANNOVAR when using UCSC Known Gene annotation, but it is regarded as a protein-coding gene by ANNOVAR when using ENSEMBL annotation. It is further clarified in the comments section as: ncRNA does NOT mean conventional non-coding RNA. It means a RNA without complete coding sequence, and it can be a coding RNA that is annotated incorrectly by RefSeq or other gene definition systems.

increased_polyadenylation_variant [SO_0001802]

A transcript processing variant whereby polyadenylation of the encoded transcript is increased with respect to the reference. Term requested by M. Dumontier, June 1 2011.

increased_transcript_level_variant [SO_0001542]

A sequence variant that increases the level of mature, spliced and processed RNA with respect to a reference sequence.

increased_transcript_stability_variant [SO_0001548]

A sequence variant that increases transcript stability with respect to a reference sequence.

increased_transcription_rate_variant [SO_0001551]

A sequence variant that increases the rate of transcription with respect to a reference sequence.

increased_translational_product_level [SO_0001556]

A sequence variant which increases the translational product level with respect to a reference sequence.

independently_known [SO_0000906]

Attribute to describe a feature that is independently known - not predicted.

inducible_promoter [SO_0002051]

A promoter whereby activity is induced by the presence or absence of biotic or abiotic factors.

inframe [SO_0001817]

An attribute describing a sequence that contains a mutation involving the deletion or insertion of one or more bases, where this number is divisible by 3.

inframe_deletion [SO_0001822]

An inframe non synonymous variant that deletes bases from the coding sequence.

inframe_indel [SO_0001820]

A coding sequence variant where the change does not alter the frame of the transcript.

inframe_variant [SO_0001650]

A sequence variant which does not cause a disruption of the translational reading frame.

Initiating Methionine [SO_0001582]

A codon variant that changes at least one base of the first codon of a transcript. This is being used to annotate changes to the first codon of a transcript, when the first annotated codon is not to methionine. A variant is predicted to change the first amino acid of a translation irrespective of the fact that the underlying codon is an AUG. As such for transcripts with an incomplete CDS (sequence does not start with an AUG), it is still called.

inosine [SO_0001230]

A modified RNA base in which hypoxanthine is bound to the ribose ring. The free molecule is CHEBI:17596.

INR_motif [SO_0000014]

A sequence element characteristic of some RNA polymerase II promoters required for the correct positioning of the polymerase for the start of transcription. Overlaps the TSS. The mammalian consensus sequence is YYAN(T|A)YY; the Drosophila consensus sequence is TCA(G|T)t(T|C). In each the A is at position +1 with respect to the TSS. Functionally similar to the TATA box element. Binds TAF1, TAF2.

INR1_motif [SO_0001163]

A promoter motif with consensus sequence TCATTCG.

insert [SO_0000046]

To insert a subsection of sequence.

insert_AA [SO_0000928]

An edit to insert a AA dinucleotide. The type of RNA editing found in the mitochondria of Myxomycota, characterized by the insertion of mono- and dinucleotides in RNAs relative to their mtDNA template and in addition, C to U base conversion. The most common mononucleotide insertion is cytidine, although a number of uridine mononucleotides are inserted at specific sites. Adenine and guanine have not been observed in mononucleotide insertions. Five different dinucleotide insertions have been observed, GC, GU, CU, AU and AA. Both mono- and dinucleotide insertions create open reading frames in mRNA and contribute to highly conserved structural features of rRNAs and tRNAs.

insert_AU [SO_0000927]

An edit to insert a AU dinucleotide. The type of RNA editing found in the mitochondria of Myxomycota, characterized by the insertion of mono- and dinucleotides in RNAs relative to their mtDNA template and in addition, C to U base conversion. The most common mononucleotide insertion is cytidine, although a number of uridine mononucleotides are inserted at specific sites. Adenine and guanine have not been observed in mononucleotide insertions. Five different dinucleotide insertions have been observed, GC, GU, CU, AU and AA. Both mono- and dinucleotide insertions create open reading frames in mRNA and contribute to highly conserved structural features of rRNAs and tRNAs.

insert_C [SO_0000920]

An edit to insert a cytidine.

insert_CU [SO_0000926]

An edit to insert a CU dinucleotide. The type of RNA editing found in the mitochondria of Myxomycota, characterized by the insertion of mono- and dinucleotides in RNAs relative to their mtDNA template and in addition, C to U base conversion. The most common mononucleotide insertion is cytidine, although a number of uridine mononucleotides are inserted at specific sites. Adenine and guanine have not been observed in mononucleotide insertions. Five different dinucleotide insertions have been observed, GC, GU, CU, AU and AA. Both mono- and dinucleotide insertions create open reading frames in mRNA and contribute to highly conserved structural features of rRNAs and tRNAs.

insert_dinucleotide [SO_0000921]

An edit to insert a dinucleotide.

insert_G [SO_0000923]

An edit to insert a G.

insert_GC [SO_0000924]

An edit to insert a GC dinucleotide. The type of RNA editing found in the mitochondria of Myxomycota, characterized by the insertion of mono- and dinucleotides in RNAs relative to their mtDNA template and in addition, C to U base conversion. The most common mononucleotide insertion is cytidine, although a number of uridine mononucleotides are inserted at specific sites. Adenine and guanine have not been observed in mononucleotide insertions. Five different dinucleotide insertions have been observed, GC, GU, CU, AU and AA. Both mono- and dinucleotide insertions create open reading frames in mRNA and contribute to highly conserved structural features of rRNAs and tRNAs.

insert_GU [SO_0000925]

An edit to insert a GU dinucleotide. The type of RNA editing found in the mitochondria of Myxomycota, characterized by the insertion of mono- and dinucleotides in RNAs relative to their mtDNA template and in addition, C to U base conversion. The most common mononucleotide insertion is cytidine, although a number of uridine mononucleotides are inserted at specific sites. Adenine and guanine have not been observed in mononucleotide insertions. Five different dinucleotide insertions have been observed, GC, GU, CU, AU and AA. Both mono- and dinucleotide insertions create open reading frames in mRNA and contribute to highly conserved structural features of rRNAs and tRNAs.

insert_U [SO_0000917]

An edit to insert a U. The insertion and deletion of uridine (U) residues, usually within coding regions of mRNA transcripts of cryptogenes in the mitochondrial genome of kinetoplastid protozoa.

Insertion [SO_0000667]

The sequence of one or more nucleotides added between two adjacent nucleotides in the sequence.

insertion_attribute [SO_0001512]

A quality of a chromosomal insertion,.

insertion_breakpoint [SO_0001414]

The point within a chromosome where a insertion begins or ends.

insertion_site [SO_0000366]

The junction where an insertion occurred.

insertional [SO_0001522]

When a translocation is simply moving genetic material from one chromosome to another.

insertional_duplication [SO_1000154]

A chromosome duplication involving the insertion of a duplicated region (as opposed to a free duplication).

inside_intron [SO_0000069]

An attribute to describe a gene when it is located within the intron of another gene.

inside_intron_antiparallel [SO_0000070]

An attribute to describe a gene when it is located within the intron of another gene and on the opposite strand.

inside_intron_parallel [SO_0000071]

An attribute to describe a gene when it is located within the intron of another gene and on the same strand.

insulator_binding_site [SO_0001460]

A binding site that, in an insulator region of a nucleotide molecule, interacts selectively and non-covalently with polypeptide residues. See tracker ID 2060908.

integrase_coding_region [SO_0000369]

[integrase_coding_region]

integrated_mobile_genetic_element [SO_0001039]

An MGE that is integrated into the host chromosome.

integrated_plasmid [SO_0001040]

A plasmid sequence that is integrated within the host chromosome.

integration_excision_site [SO_0000946]

A region specifically recognised by a recombinase, which inserts or removes another region marked by a distinct cognate integration/excision site.

intein_containing [SO_0000729]

An attribute of protein-coding genes where the initial protein product contains an intein.

intein_encoding_region [SO_0002026]

The nucleotide sequence which encodes the intein portion of the precursor gene. Requested by Janos Demeter 2014.

interband [SO_0000450]

A light region between two darkly staining bands in a polytene chromosome.

interchromosomal [SO_0001511]

A change in chromosomes that occurs between two sections of the same chromosome or between homologous chromosomes.

interchromosomal_breakpoint [SO_0001873]

A rearrangement breakpoint between two different chromosomes.

interchromosomal_duplication [SO_0000457]

A chromosome duplication involving an insertion from another chromosome.

interchromosomal_mutation [SO_1000031]

A chromosomal structure variation whereby more than one chromosome is involved.

interchromosomal_translocation [SO_0002060]

A translocation where the regions involved are from different chromosomes.

interchromosomal_transposition [SO_1000155]

A chromosome structure variation whereby a transposition occurred between chromosomes.

intergenic_1kb_variant [SO_0002074]

A variant that falls in an intergenic region that is 1 kb or less between 2 genes. This term is added to map to the Annovar annotation ‘upstream,downstream’ .

intergenic_variant [SO_0001628]

A sequence variant located in the intergenic region, between genes. EBI term Intergenic variations - More than 5 kb either upstream or downstream of a transcript.

interior_coding_exon [SO_0000004]

A coding exon that is not the most 3-prime or the most 5-prime in a given transcript.

interior_exon [SO_0000201]

An exon that is bounded by 5’ and 3’ splice sites.

interior_intron [SO_0000191]

An intron that is not the most 3-prime or the most 5-prime in a given transcript.

intermediate [SO_0000933]

An attribute to describe a feature between stages of processing.

intermediate_element [SO_0001677]

A core promoter region of RNA polymerase III type 1 promoters.

internal_eliminated_sequence [SO_0000671]

A sequence eliminated from the genome of ciliates during nuclear differentiation.

internal_feature_elongation [SO_0001908]

A sequence variant that causes the extension of a genomic feature from within the feature rather than from the terminus of the feature, with regard to the reference sequence.

internal_guide_sequence [SO_0001016]

A purine-rich sequence in the group I introns which determines the locations of the splice sites in group I intron splicing and has catalytic activity.

internal_ribosome_entry_site [SO_0000243]

Sequence element that recruits a ribosomal subunit to internal mRNA for translation initiation.

internal_Shine_Dalgarno_sequence [SO_1001260]

A Shine-Dalgarno sequence that stimulates recoding through interactions with the anti-Shine-Dalgarno in the RNA of small ribosomal subunits of translating ribosomes. The signal is only operative in Bacteria.

internal_transcribed_spacer_region [SO_0000639]

Non-coding regions of DNA sequence that separate genes coding for the 28S, 5.8S, and 18S ribosomal RNAs.

internal_UTR [SO_0000241]

A UTR bordered by the terminal and initial codons of two CDSs in a polycistronic transcript. Every UTR is either 5’, 3’ or internal.

intrachromosomal [SO_0001510]

A change in chromosomes that occurs between two separate chromosomes.

intrachromosomal_breakpoint [SO_0001874]

A rearrangement breakpoint within the same chromosome.

intrachromosomal_duplication [SO_1000038]

A duplication that occurred within a chromosome.

intrachromosomal_mutation [SO_1000028]

A chromosomal structure variation within a single chromosome.

intrachromosomal_translocation [SO_0002061]

A translocation where the regions involved are from the same chromosome.

intrachromosomal_transposition [SO_1000041]

A chromosome structure variation whereby a transposition occurred within a chromosome.

intragenic_variant [SO_0002011]

A variant that occurs within a gene but falls outside of all transcript features. This occurs when alternate transcripts of a gene do not share overlapping sequence. Requested by Pablo Cingolani, for use in SnpEff.

intramembrane_polypeptide_region [SO_0001075]

Polypeptide region present in the lipid bilayer.

intrinsically_unstructured_polypeptide_region [SO_0100003]

A region of polypeptide chain with high conformational flexibility.

introgressed_chromosome_region [SO_0000664]

A region of a chromosome that has been introduced by backcrossing with a separate species.

intron [SO_0000188]

A region of a primary transcript that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

intron_attribute [SO_0000661]

[intron_attribute]

intron_base_5 [SO_0001994]

Fifth intronic position after the intron exon boundary, close to the 5’ edge of the intron.

intron_domain [SO_0001014]

An intronic region that has an attribute. Requested by Colin Batchelor, Feb 2007.

intron_gain_variant [SO_0001573]

A sequence variant whereby an intron is gained by the processed transcript; usually a result of an alteration of the donor or acceptor.

intron_variant [SO_0001627]

A transcript variant occurring within an intron. EBI term: Intronic variations - In intron.

intronic_lncRNA [SO_0001903]

[term replaced by; intronic_lncRNA; sense_intronic_ncRNA]

intronic_regulatory_region [SO_0001492]

A regulatory region that is part of an intron. Moved from transcription_regulatory_region (SO:0001679) to transcriptional_cis_regulatory_region (SO:0001055) by Dave Sant on Feb 11, 2021 when transcription_regulatory_region was merged into transcriptional_cis_regulatory_region to be consistent with GO and reduce redundancy as part of the GREEKC consortium. See GitHub Issue #527.

intronic_splice_enhancer [SO_0000320]

Sequences within the intron that modulate splice site selection for some introns.

intronic_splicing_enhancer [SO_0002057]

[ISE; intronic_splicing_enhancer]

intronic_splicing_silencer [SO_0002056]

An intronic splicing regulatory element that functions to recruit trans acting splicing factors suppress the transcription of the gene or genes they control. Requested by Javier Diez Perez.

invalidated [SO_0000790]

An attribute describing a feature that is invalidated.

invalidated_by_chimeric_cDNA [SO_0000362]

A cDNA clone constructed from more than one mRNA. Usually an experimental artifact.

invalidated_by_genomic_contamination [SO_0000414]

An attribute to describe a feature that is invalidated due to genomic contamination.

invalidated_by_genomic_polyA_primed_cDNA [SO_0000415]

An attribute to describe a feature that is invalidated due to polyA priming.

invalidated_by_partial_processing [SO_0000416]

An attribute to describe a feature that is invalidated due to partial processing.

invalidated_cDNA_clone [SO_0000809]

A cDNA clone that is invalid.

inversion [SO_1000036]

A continuous nucleotide sequence is inverted in the same position.

inversion_attribute [SO_0001517]

When a region of a chromosome is changed to the reverse order without duplication or deletion.

inversion_breakpoint [SO_0001022]

The point within a chromosome where an inversion begins or ends.

inversion_cum_translocation [SO_1000148]

A chromosomal translocation whereby the first two breaks are in the same chromosome, and the region between them is rejoined in inverted order to the other side of the first break, such that both sides of break one are present on the same chromosome. The remaining free ends are joined as a translocation with those resulting from the third break.

inversion_derived_aneuploid_chromosome [SO_0000567]

A chromosome may be generated by recombination between two inversions; presumed to have a deficiency or duplication at each end of the inversion.

inversion_derived_bipartite_deficiency [SO_0000461]

A chromosomal deletion whereby a chromosome generated by recombination between two inversions; has a deficiency at each end of the inversion.

inversion_derived_bipartite_duplication [SO_0000547]

A chromosome generated by recombination between two inversions; there is a duplication at each end of the inversion.

inversion_derived_deficiency_plus_aneuploid [SO_0000512]

A chromosomal deletion whereby a chromosome generated by recombination between two inversions; has a deficiency at one end and presumed to have a deficiency or duplication at the other end of the inversion.

inversion_derived_deficiency_plus_duplication [SO_0000465]

A chromosome deletion whereby a chromosome is generated by recombination between two inversions; there is a deficiency at one end of the inversion and a duplication at the other end of the inversion.

inversion_derived_duplication_plus_aneuploid [SO_0000549]

A chromosome generated by recombination between two inversions; has a duplication at one end and presumed to have a deficiency or duplication at the other end of the inversion.

inversion_site [SO_0000948]

A region specifically recognised by a recombinase, which inverts the region flanked by a pair of sites. A target region for site-specific inversion of a DNA region and which carries binding sites for a site-specific recombinase and accessory proteins as well as the site for specific cleavage by the recombinase.

inversion_site_part [SO_0001048]

A region located within an inversion site. A term created to allow the parts of an inversion site have an is_a path back to the root.

invert [SO_0000047]

To invert a subsection of sequence.

inverted [SO_0001515]

A quality of an insertion where the insert is in a cytologically inverted orientation.

inverted_insertional_duplication [SO_1000153]

An insertional duplication where a copy of the segment between the first two breaks listed is inserted at the third break; the insertion is in cytologically inverted orientation with respect to its flanking segments.

inverted_interchromosomal_transposition [SO_1000156]

An interchromosomal transposition whereby a copy of the segment between the first two breaks listed is inserted at the third break; the insertion is in cytologically inverted orientation with respect to its flanking segment.

inverted_intrachromosomal_transposition [SO_1000158]

An intrachromosomal transposition whereby the segment between the first two breaks listed is removed and inserted at the third break; the insertion is in cytologically inverted orientation with respect to its flanking segments.

inverted_repeat [SO_0000294]

The sequence is complementarily repeated on the opposite strand. It is a palindrome, and it may, or may not be hyphenated. Examples: GCTGATCAGC, or GCTGA—–TCAGC.

inverted_ring_chromosome [SO_0000439]

A ring chromosome is a chromosome whose arms have fused together to form a ring in an inverted fashion, often with the loss of the ends of the chromosome.

inverted_tandem_duplication [SO_1000040]

A tandem duplication where the individual regions are not in the same orientation.

IRLinv_site [SO_0001046]

Component of the inversion site located at the left of a region susceptible to site-specific inversion.

iron_repressed_GATA_element [SO_0001851]

A GATA promoter element with consensus sequence WGATAA, found in promoters of genes repressed in the presence of iron. The synonym IDP (GATA) is found in an annotation but un-traced as far as literature goes.

iron_responsive_element [SO_0001182]

A regulatory sequence found in the 5’ and 3’ UTRs of many mRNAs which encode iron-binding proteins. It has a hairpin structure and is recognized by trans-acting proteins known as iron-regulatory proteins.

IRRinv_site [SO_0001047]

Component of the inversion site located at the right of a region susceptible to site-specific inversion.

isoleucine [SO_0001438]

A non-polar, hydorophobic amino acid encoded by the codons ATH (ATT, ATC and ATA). A place holder for a cross product with chebi.

isoleucine_tRNA_primary_transcript [SO_0000220]

A primary transcript encoding isoleucyl tRNA (SO:0000263).

isoleucyl_tRNA [SO_0000263]

A tRNA sequence that has an isoleucine anticodon, and a 3’ isoleucine binding region.

isowyosine [SO_0001342]

Isowyosine is a modified guanosine base feature.

ISRE [SO_0001715]

An ISRE is a transcriptional cis regulatory region, containing the consensus region: YAGTTTC(A/T)YTTTYCC, responsible for increased transcription via interferon binding. Term requested via tracker (2981725) by Alan Ruttenberg, April 2010. It has been described as both an enhancer and a promoter, so the parent is the more general term. Moved from is_a SO:0001055 transcriptional_cis_regulatory_region to SO:0000235 TF_binding_site after Colin Logie pointed out that this is a consensus sequence where transcription factors bind, GREEKC Jan 21, 2021.

J_C_cluster [SO_0000511]

Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including at least one J-gene and one C-gene.

J_cluster [SO_0000513]

Genomic DNA of immunoglobulin/T-cell receptor gene in germline configuration including more than one J-gene.

J_gene_recombination_feature [SO_0000302]

Recombination signal including J-heptamer, J-spacer and J-nonamer in 5’ of J-region of a J-gene or J-sequence.

J_gene_segment [SO_0000470]

Germline genomic DNA of an immunoglobulin/T-cell receptor gene including J-region with 5’ UTR (SO:0000204) and 3’ UTR (SO:0000205), also designated as J-segment.

J_heptamer [SO_0000515]

7 nucleotide recombination site (e.g. CACAGTG), part of a J-gene recombination feature of an immunoglobulin/T-cell receptor gene.

J_nonamer [SO_0000514]

9 nucleotide recombination site (e.g. GGTTTTTGT), part of a J-gene recombination feature of an immunoglobulin/T-cell receptor gene.

J_spacer [SO_0000517]

12 or 23 nucleotide spacer between the J-nonamer and the J-heptamer of a J-gene recombination feature of an immunoglobulin/T-cell receptor gene.

junction [SO_0000699]

A sequence_feature with an extent of zero. A junction is a boundary between regions. A boundary has an extent of zero.

KEN_box [SO_0001807]

A conserved polypeptide motif that can be recognized by FZR/Cdh1-activated anaphase-promoting complex/cyclosome (APC/C) and targets a protein for ubiquitination and subsequent degradation by the APC/C. The consensus sequence is KENXXXN.

kinetoplast_gene [SO_0000089]

A gene located in kinetoplast sequence.

kozak_sequence [SO_0001647]

A kind of ribosome entry site, specific to Eukaryotic organisms that overlaps part of both 5’ UTR and CDS sequence.

L_box [SO_0001981]

An orientation dependent regulatory promoter element, with consensus sequence of TTGCACAN4TTGCACA, found in plants.

L1_LINE_retrotransposon [SO_0002272]

Long interspersed element-1 (LINE-1) elements are found in the human genome, which contains ORF1 (open reading frame1, including CC, coiled coil; RRM, RNA recognition motif; CTD, carboxyl-terminal domain) and ORF2 (including EN, endonuclease; RT, reverse transcriptase; C, cysteine-rich domain). The L1-encoded proteins (ORF1p and ORF2p) can mobilize nonautonomous retrotransposons, other noncoding RNAs, and messenger RNAs. Added as per GitHub Issue Request #488 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/488)

laevosynaptic_chromosome [SO_1000143]

LS is an autosynaptic chromosome carrying the two left (L = levo) telomeres.

lambda_clone [SO_0000160]

A linear clone derived from lambda bacteriophage. The genes involved in the lysogenic pathway are removed from the from the viral DNA. Up to 25 kb of foreign DNA can then be inserted into the lambda genome.

lambda_vector [SO_0000754]

The lambda bacteriophage is the vector for the linear lambda clone. The genes involved in the lysogenic pathway are removed from the from the viral DNA. Up to 25 kb of foreign DNA can then be inserted into the lambda genome.

lariat_intron [SO_0001958]

A kind of intron whereby the excision is driven by lariat formation. Requested by PomBase 3604508.

late_origin_of_replication [SO_0002141]

An origin of replication that initiates late in S phase.

left_handed_peptide_helix [SO_0001115]

A left handed helix is a region of peptide where the coiled conformation turns in an anticlockwise, left handed screw.

lethal_variant [SO_0001773]

A sequence variant where the mutated gene product does not allow for one or more basic functions necessary for survival.

leucine [SO_0001437]

A non-polar, hydorophobic amino acid encoded by the codons CTN (CTT, CTC, CTA and CTG), TTA and TTG. A place holder for a cross product with chebi.

leucine_tRNA_primary_transcript [SO_0000221]

A primary transcript encoding leucyl tRNA (SO:0000264).

leucoplast_chromosome [SO_0000823]

A chromosome with origin in a leucoplast.

leucoplast_gene [SO_0000095]

A plastid gene from leucoplast sequence.

leucoplast_sequence [SO_0000747]

DNA belonging to the genome of a leucoplast, a colorless plastid generally containing starch or oil.

leucyl_tRNA [SO_0000264]

A tRNA sequence that has a leucine anticodon, and a 3’ leucine binding region.

level_of_transcript_variant [SO_0001540]

A sequence variant which alters the level of a transcript.

ligand_binding_site [SO_0001657]

A binding site that, in the molecule, interacts selectively and non-covalently with a small molecule such as a drug, or hormone.

ligation_based_read [SO_0001425]

A read produced by ligation based sequencing technologies. An example of this kind of read is one produced by ABI SOLiD.

lincRNA [SO_0001463]

Long, intervening non-coding RNA. A transcript that does not overlap within the start or end genomic coordinates of a coding gene or pseudogene on either strand.

lincRNA_gene [SO_0001641]

A gene that encodes a long, intervening non-coding RNA.

LINE_element [SO_0000194]

A dispersed repeat family with many copies, each from 1 to 6 kb long. New elements are generated by retroposition of a transcribed copy. Typically the LINE contains 2 ORF’s one of which is reverse transcriptase, and 3’and 5’ direct repeats.

LINE1_deletion [SO_0002069]

A deletion of a LINE1 mobile element with respect to a reference.

LINE1_insertion [SO_0002064]

An insertion from the Line1 family of mobile elements.

linear [SO_0000987]

A quality of a nucleotide polymer that has a 3’-terminal residue and a 5’-terminal residue. Attributes added to describe the different kinds of replicon. SO workshop, September 2006.

linear_double_stranded_DNA_chromosome [SO_0000957]

Structural unit composed of a self-replicating, double-stranded, linear DNA molecule.

linear_double_stranded_RNA_chromosome [SO_0000964]

Structural unit composed of a self-replicating, double-stranded, linear RNA molecule.

linear_single_stranded_DNA_chromosome [SO_0000959]

Structural unit composed of a self-replicating, single-stranded, linear DNA molecule.

linear_single_stranded_RNA_chromosome [SO_0000963]

Structural unit composed of a self-replicating, single-stranded, linear RNA molecule.

linkage_group [SO_0000018]

A group of loci that can be grouped in a linear order representing the different degrees of linkage among the genes concerned.

lipoprotein_signal_peptide [SO_0100009]

A peptide that acts as a signal for both membrane translocation and lipid attachment in prokaryotes.

LNA [SO_0001188]

An attribute describing a sequence consisting of nucleobases attached to a repeating unit made of ’locked’ deoxyribose rings connected to a phosphate backbone. The deoxyribose unit’s conformation is ’locked’ by a 2’-C,4’-C-oxymethylene link. Do not use this term for feature annotation. Use LNA_oligo (SO:0001189) instead.

lnc_RNA [SO_0001877]

A non-coding RNA over 200nucleotides in length.

lncRNA_gene [SO_0002127]

A gene that encodes a long non-coding RNA. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes.

lncRNA_primary_transcript [SO_0002035]

A primary transcript encoding a lncRNA.

lncRNA_with_retained_intron [SO_0002113]

A lncRNA transcript containing a retained intron. Term added as part of collaboration with Gencode, adding biotypes used in annotation.

long_terminal_repeat [SO_0000286]

A sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses.

loR [SO_0002033]

A short, non coding transcript of loop-derived sequences encoded in precursor miRNA. MoRs are generated from miR hairpins that are longer and can produce two functional miR per strand. They are called moRs because they are not located next to the loop and thus their biogenesis process is a little different, but functionally, they are supposed to act like miRs. It is the same for loRs that are the loop fragments, they are generated differently than miRs or moRs but if loaded into the risc they are supposed to act the same way miRs do. Requested by Thomas Desvignes, Jan 2015.

loss_of_function_variant [SO_0002054]

A sequence variant whereby the gene product has diminished or abolished function.

loss_of_heterozygosity [SO_0001786]

A functional variant whereby the sequence alteration causes a loss of function of one allele of a gene.

low_complexity [SO_0001004]

When a sequence does not contain an equal distribution of all four possible nucleotide bases or does not contain all nucleotide bases.

low_complexity_region [SO_0001005]

A region where the DNA does not contain an equal distrubution of all four possible nucleotides or does not contain all four nucleotides.

loxP_site [SO_0000346]

Cre-Recombination target sequence.

LTR_component [SO_0000848]

The long terminal repeat found at the ends of the sequence to be inserted into the host genome.

LTR_retrotransposon [SO_0000186]

A retrotransposon flanked by long terminal repeat sequences.

lysine [SO_0001450]

A positively charged, hydorophilic amino acid encoded by the codons AAA and AAG. A place holder for a cross product with chebi.

lysine_tRNA_primary_transcript [SO_0000222]

A primary transcript encoding lysyl tRNA (SO:0000265).

lysosomal_localization_signal [SO_0001530]

A polypeptide region that targets a polypeptide to the lysosome.

lysyl_tRNA [SO_0000265]

A tRNA sequence that has a lysine anticodon, and a 3’ lysine binding region.

M26_binding_site [SO_0001900]

[CRE; term replaced by; M26_binding_site]

macronuclear_chromosome [SO_0000824]

A chromosome originating in a macronucleus.

macronuclear_sequence [SO_0000083]

DNA belonging to the macronuclei of ciliates.

macronucleus_destined_segment [SO_0000672]

A sequence that is conserved, although rearranged relative to the micronucleus, in the macronucleus of a ciliate genome.

major_TSS [SO_0001238]

The tanscription start site that is most frequently used for transcription of a gene.

mannosyl_queuosine [SO_0001320]

Mannosyl_queuosine is a modified 7-deazoguanosine.

Mat2P [SO_0002157]

A gene cassette array containing H+ mating type specific information.

Mat3M [SO_0002158]

A gene cassette array containing H- mating type specific information.

match [SO_0000343]

A region of sequence, aligned to another sequence with some statistical significance, using an algorithm such as BLAST or SIM4.

match_part [SO_0000039]

A part of a match, for example an hsp from blast is a match_part.

match_set [SO_0000038]

A collection of match parts.

maternal_uniparental_disomy [SO_0001745]

Uniparental disomy is a sequence_alteration where a diploid individual receives two copies for all or part of a chromosome from the mother and no copies of the same chromosome or region from the father.

maternal_variant [SO_0001775]

A variant in the genetic material inherited from the mother.

maternally_imprinted [SO_0000135]

The maternal copy of the gene is modified, rendering it transcriptionally silent.

maternally_imprinted_gene [SO_0000888]

A gene that is maternally_imprinted.

mathematically_defined_repeat [SO_0001642]

A mathematically defined repeat (MDR) is a experimental feature that is determined by querying overlapping oligomers of length k against a database of shotgun sequence data and identifying regions in the query sequence that exceed a statistically determined threshold of repetitiveness. Mathematically defined repeat regions are determined without regard to the biological origin of the repetitive region. The repeat units of a MDR are the overlapping oligomers of size k that were used to for the query. Tools that can annotate mathematically defined repeats include Tallymer (Kurtz et al 2008, BMC Genomics: 517) and RePS (Wang et al, Genome Res 12(5): 824-831.).

mating_type_M_box [SO_0001852]

A promoter element with consensus sequence ACAAT, found in promoters of mating type M-specific genes in fission yeast and bound by the transcription factor Mat1-Mc. Note that this should not be confused with the M-box that has consensus sequence CATGTG and is bound by bHLH transcription factors such as MITF.

mating_type_region [SO_0001789]

A specialized region in the genomes of some yeast and fungi, the genes of which regulate mating type.

mating_type_region_motif [SO_0001999]

DNA motif that is a component of a mating type region.

mating_type_region_replication_fork_barrier [SO_0002021]

A DNA motif that is found in eukaryotic rDNA repeats, and is a site of replication fork pausing. Requested by Midori Harris.

matrix_attachment_site [SO_0000036]

A DNA region that is required for the binding of chromatin to the nuclear matrix.

mature_miRNA_variant [SO_0001620]

A transcript variant located with the sequence of the mature miRNA. EBI term: Within mature miRNA - Located within a microRNA.

mature_protein_region [SO_0000419]

The polypeptide sequence that remains when the cleaved peptide regions have been cleaved from the immature peptide. This term mature peptide, merged with the biosapiens term mature protein region and took that to be the new name. Old def: The coding sequence for the mature or final peptide or protein product following post-translational modification.

mature_protein_region_of_CDS [SO_0002249]

A CDS region corresponding to a mature protein region of a polypeptide. Added as per request from GitHub Issue #484 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/484)

mature_transcript_region [SO_0000834]

A region of a mature transcript. A manufactured term to collect together the parts of a mature transcript and give them an is_a path to the root.

maxicircle [SO_0000742]

A maxicircle is a replicon, part of a kinetoplast, that contains open reading frames and replicates via a rolling circle method.

maxicircle_gene [SO_0000654]

A mitochondrial gene located in a maxicircle.

MCB [SO_0001855]

A promoter element with consensus sequence ACGCGT, bound by the transcription factor complex MBF (MCB-binding factor) and found in promoters of genes expressed during the G1/S transition of the cell cycle.

meiotic_recombination_region [SO_0002155]

A genomic region in which there is an exchange of genetic material as a result of the repair of meiosis-specific double strand breaks that occur during meiotic prophase.

member_of_regulon [SO_1001217]

A gene that is a member of a group of genes that are either regulated or transcribed together.

membrane_peptide_loop [SO_0001076]

Polypeptide region localized within the lipid bilayer where both ends traverse the same membrane.

membrane_structure [SO_0001071]

Arrangement of the polypeptide with respect to the lipid bilayer. Range.

metabolic_island [SO_0000774]

A transmissible element containing genes involved in metabolism, analogous to the pathogenicity islands of gram negative bacteria. Genes for phenolic compound degradation in Pseudomonas putida are found on metabolic islands.

metal_binding_site [SO_0001656]

A binding site that, in the molecule, interacts selectively and non-covalently with metal ions. See GO:0046872 : metal ion binding.

methionine [SO_0001442]

A non-polar, hydorophobic amino acid encoded by the codon ATG. A place holder for a cross product with chebi.

methionine_tRNA_primary_transcript [SO_0000223]

A primary transcript encoding methionyl tRNA (SO:0000266).

methionyl_tRNA [SO_0000266]

A tRNA sequence that has a methionine anticodon, and a 3’ methionine binding region.

methylated_adenine [SO_0000161]

A modified base in which adenine has been methylated.

methylated_cytosine [SO_0000114]

A methylated deoxy-cytosine.

methylated_DNA_base_feature [SO_0000306]

A nucleotide modified by methylation.

methylation_guide_snoRNA [SO_0005841]

A snoRNA that specifies the site of 2’-O-ribose methylation in an RNA molecule by base pairing with a short sequence around the target residue. Has RNA 2’-O-ribose methylation guide activity (GO:0030561).

methylation_guide_snoRNA_primary_transcript [SO_0000580]

A primary transcript encoding a methylation guide small nucleolar RNA.

methylinosine [SO_0001233]

A modified RNA base in which methylhypoxanthine is bound to the ribose ring.

methylwyosine [SO_0001337]

Methylwyosine is a modified guanosine base feature.

microarray_oligo [SO_0000328]

A DNA sequence used experimentally to detect the presence or absence of a complementary nucleic acid.

micronuclear_chromosome [SO_0000825]

A chromosome originating in a micronucleus.

micronuclear_sequence [SO_0000084]

DNA belonging to the micronuclei of a cell.

mini_exon_donor_RNA [SO_0000635]

A primary transcript that donates the spliced leader to other mRNA.

mini_gene [SO_0000815]

By definition, minigenes are short open-reading frames (ORF), usually encoding approximately 9 to 20 amino acids, which are expressed in vivo (as distinct from being synthesized as peptide or protein ex vivo and subsequently injected). The in vivo synthesis confers a distinct advantage: the expressed sequences can enter both antigen presentation pathways, MHC I (inducing CD8+ T- cells, which are usually cytotoxic T-lymphocytes (CTL)) and MHC II (inducing CD4+ T-cells, usually ‘T-helpers’ (Th)); and can encounter B-cells, inducing antibody responses. Three main vector approaches have been used to deliver minigenes: viral vectors, bacterial vectors and plasmid DNA.

minicircle [SO_0000980]

A minicircle is a replicon, part of a kinetoplast, that encodes for guide RNAs.

minicircle_gene [SO_0000975]

A gene found within a minicircle.

minor_TSS [SO_0001239]

A tanscription start site that is not the most frequently used for transcription of a gene.

minus_1_frameshift [SO_0000866]

A frameshift caused by deleting one base.

minus_1_frameshift_variant [SO_0001592]

A sequence variant which causes a disruption of the translational reading frame, by shifting one base ahead.

minus_1_translationally_frameshifted [SO_1001262]

An attribute describing a translational frameshift of -1.

minus_12_signal [SO_0001673]

A conserved region about 12-bp upstream of the start point of bacterial transcription units, involved with sigma factor 54.

minus_2_frameshift [SO_0000867]

A frameshift caused by deleting two bases.

minus_2_frameshift_variant [SO_0001593]

A sequence variant which causes a disruption of the translational reading frame, by shifting two bases forward.

minus_24_signal [SO_0001674]

A conserved region about 24-bp upstream of the start point of bacterial transcription units, involved with sigma factor 54.

minus_35_signal [SO_0000176]

A conserved hexamer about 35-bp upstream of the start point of bacterial transcription units; consensus=TTGACa or TGTTGACA. This region is associated with sigma factor 70. Changed from is_a SO:0000713 DNA_motif to is_a SO:0002312 core_prokaryotic_promoter_element in response to GREEKC Initiative Dave Sant Aug 2020. Changed from is_a SO:0002312 core_prokaryotic_promoter_element back to is_a SO:0000713 DNA_motif to be consistent with minus_12_signal and minus_24_signal on 12 July 2021.

miR_encoding_lncRNA_primary_transcript [SO_0002036]

A lncRNA primary transcript that also encodes pre-miR sequence that is processed to form functionally active miRNA.

miR_encoding_shRNA_primary_transcript [SO_0002039]

A shRNA primary transcript that also encodes pre-miR sequence that is processed to form functionally active miRNA.

miR_encoding_snoRNA_primary_transcript [SO_0002034]

A snoRNA primary transcript that also encodes pre-miR sequence that is processed to form functionally active miRNA.

miR_encoding_tRNA_primary_transcript [SO_0002037]

A tRNA primary transcript that also encodes pre-miR sequence that is processed to form functionally active miRNA.

miR_encoding_vaultRNA_primary_transcript [SO_0002041]

A vaultRNA primary transcript that also encodes pre-miR sequence that is processed to form functionally active miRNA.

miR_encoding_Y_RNA_primary_transcript [SO_0002043]

A Y-RNA primary transcript that also encodes pre-miR sequence that is processed to form functionally active miRNA.

miRNA_antiguide [SO_0001473]

A region of the pri miRNA that base pairs with the guide to form the hairpin.

miRNA_encoding [SO_0000571]

A region that can be transcribed into a microRNA (miRNA).

miRNA_gene [SO_0001265]

A small noncoding RNA of approximately 22 nucleotides in length which may be involved in regulation of gene expression. Moved from ncRNA_gene to sncRNA_gene 27 April 2021 to be more consistent with the organization of the ncRNA branch of SO. Requested by FlyBase, moved by Dave Sant. See GitHub Issue #514.

miRNA_loop [SO_0001246]

The loop of the hairpin loop formed by folding of the pre-miRNA.

miRNA_primary_transcript [SO_0000647]

A primary transcript encoding a micro RNA.

miRNA_primary_transcript_region [SO_0001243]

A part of an miRNA primary_transcript.

miRNA_stem [SO_0001245]

The stem of the hairpin loop formed by folding of the pre-miRNA.

miRNA_target_site [SO_0000934]

A miRNA target site is a binding site where the molecule is a micro RNA.

miRtron [SO_0001034]

A de-branched intron which mimics the structure of pre-miRNA and enters the miRNA processing pathway without Drosha mediated cleavage. Ruby et al. Nature 448:83 describe a new class of miRNAs that are derived from de-branched introns.

missense_variant [SO_0001583]

A sequence variant, that changes one or more bases, resulting in a different amino acid sequence but where the length is preserved. EBI term: Non-synonymous SNPs. SNPs that are located in the coding sequence and result in an amino acid change in the encoded peptide sequence. A change that causes a non_synonymous_codon can be more than 3 bases - for example 4 base substitution.

MITE [SO_0000338]

A highly repetitive and short (100-500 base pair) transposable element with terminal inverted repeats (TIR) and target site duplication (TSD). MITEs do not encode proteins.

mitochondrial_chromosome [SO_0000819]

A chromosome originating in a mitochondria.

mitochondrial_contig [SO_0001921]

A contig of mitochondria derived sequences. Requested by Bayer Cropscience, October, 2012.

mitochondrial_DNA [SO_0001032]

DNA belonging to the genome of a mitochondria. This terms is used by MO.

mitochondrial_DNA_read [SO_0001929]

A sequencer read of a mitochondrial DNA sample. Requested by Bayer Cropscience, October, 2012.

mitochondrial_sequence [SO_0000737]

DNA belonging to the genome of a mitochondria. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

mitochondrial_supercontig [SO_0001922]

A scaffold composed of mitochondrial contigs.

mitochondrial_targeting_signal [SO_0001808]

A polypeptide region that targets a polypeptide to the mitochondrion.

mitotic_recombination_region [SO_0002154]

A genomic region where there is an exchange of genetic material with another genomic region, occurring in somatic cells.

MNP [SO_0001013]

A multiple nucleotide polymorphism with alleles of common length > 1, for example AAA/TTT.

MNV [SO_0002007]

An MNV is a multiple nucleotide variant (substitution) in which the inserted sequence is the same length as the replaced sequence.

mobile_element_deletion [SO_0002066]

A deletion of a mobile element when comparing a reference sequence (has mobile element) to a individual sequence (does not have mobile element).

mobile_element_insertion [SO_0001837]

A kind of insertion where the inserted sequence is a mobile element. Requested by the EBI.

mobile_genetic_element [SO_0001037]

A nucleotide region with either intra-genome or intracellular mobility, of varying length, which often carry the information necessary for transfer and recombination with the host genome.

mobile_intron [SO_0000666]

An intron (mitochondrial, chloroplast, nuclear or prokaryotic) that encodes a double strand sequence specific endonuclease allowing for mobility.

modified_adenine [SO_0001962]

A modified adenine DNA base feature.

modified_adenosine [SO_0001273]

A modified adenine is an adenine base feature that has been altered.

modified_amino_acid_feature [SO_0001385]

A post translationally modified amino acid feature.

modified_cytidine [SO_0001275]

A modified cytidine is a cytidine base feature which has been altered.

modified_cytosine [SO_0001963]

A modified cytosine DNA base feature.

modified_DNA_base [SO_0000305]

A modified nucleotide, i.e. a nucleotide other than A, T, C. G. Modified base:<modified_base>.

modified_glycine [SO_0001386]

A post translationally modified glycine amino acid feature.

modified_guanine [SO_0001964]

A modified guanine DNA base feature.

modified_guanosine [SO_0001276]

A guanosine base that has been modified.

modified_inosine [SO_0001274]

A modified inosine is an inosine base feature that has been altered.

modified_L_alanine [SO_0001387]

A post translationally modified alanine amino acid feature.

modified_L_arginine [SO_0001406]

A post translationally modified arginine amino acid feature.

modified_L_asparagine [SO_0001388]

A post translationally modified asparagine amino acid feature.

modified_L_aspartic_acid [SO_0001389]

A post translationally modified aspartic acid amino acid feature.

modified_L_cysteine [SO_0001390]

A post translationally modified cysteine amino acid feature.

modified_L_glutamic_acid [SO_0001391]

A post translationally modified glutamic acid.

modified_L_glutamine [SO_0001394]

A post translationally modified glutamine amino acid feature.

modified_L_histidine [SO_0001398]

A post translationally modified histidine amino acid feature.

modified_L_isoleucine [SO_0001396]

A post translationally modified isoleucine amino acid feature.

modified_L_leucine [SO_0001401]

A post translationally modified leucine amino acid feature.

modified_L_lysine [SO_0001400]

A post translationally modified lysine amino acid feature.

modified_L_methionine [SO_0001395]

A post translationally modified methionine amino acid feature.

modified_L_phenylalanine [SO_0001397]

A post translationally modified phenylalanine amino acid feature.

modified_L_proline [SO_0001404]

A post translationally modified proline amino acid feature.

modified_L_selenocysteine [SO_0001402]

A post translationally modified selenocysteine amino acid feature.

modified_L_serine [SO_0001399]

A post translationally modified serine amino acid feature.

modified_L_threonine [SO_0001392]

A post translationally modified threonine amino acid feature.

modified_L_tryptophan [SO_0001393]

A post translationally modified tryptophan amino acid feature.

modified_L_tyrosine [SO_0001405]

A post translationally modified tyrosine amino acid feature.

modified_L_valine [SO_0001403]

A post translationally modified valine amino acid feature.

modified_RNA_base_feature [SO_0000250]

A post_transcriptionally modified base.

modified_uridine [SO_0001277]

A uridine base that has been modified.

molecular_contact_region [SO_0100002]

A region that is involved a contact with another molecule. Range.

monocistronic [SO_0000878]

An attribute describing a sequence that contains the code for one gene product.

monocistronic_mRNA [SO_0000633]

An mRNA with either a single protein product, or for which the regions encoding all its protein products overlap.

monocistronic_primary_transcript [SO_0000632]

A primary transcript encoding for one gene product.

monocistronic_transcript [SO_0000665]

A transcript that is monocistronic.

monomeric_repeat [SO_0001934]

A repeat_region containing repeat_units of 1 bp that is repeated multiple times in tandem.

moR [SO_0002032]

A non-coding transcript encoded by sequences adjacent to the ends of the 5’ and 3’ miR-encoding sequences that abut the loop in precursor miRNA. MoRs are generated from miR hairpins that are longer and can produce two functional miR per strand. They are called moRs because they are not located next to the loop and thus their biogenesis process is a little different, but functionally, they are supposed to act like miRs. It is the same for loRs that are the loop fragments, they are generated differently than miRs or moRs but if loaded into the risc they are supposed to act the same way miRs do. Requested by Thomas Desvignes, Jan 2015.

morpholino_backbone [SO_0001183]

An attribute describing a sequence composed of nucleobases bound to a morpholino backbone. A morpholino backbone consists of morpholine (CHEBI:34856) rings connected by phosphorodiamidate linkages. Do not use this for feature annotation. Use morpholino_oligo (SO:0000034) instead.

morpholino_oligo [SO_0000034]

Morpholino oligos are synthesized from four different Morpholino subunits, each of which contains one of the four genetic bases (A, C, G, T) linked to a 6-membered morpholine ring. Eighteen to 25 subunits of these four subunit types are joined in a specific order by non-ionic phosphorodiamidate intersubunit linkages to give a Morpholino.

mRNA_attribute [SO_0000863]

An attribute describing an mRNA feature.

mRNA_by_polyadenylation_status [SO_0000245]

[mRNA_by_polyadenylation_status]

mRNA_contig [SO_0001829]

A contig composed of mRNA_reads. Requested by Bayer Cropscience June, 2011.

mRNA_not_polyadenylated [SO_0000247]

[mRNA_not_polyadenylated]

mRNA_read [SO_0001827]

A sequencer read of an mRNA substrate. Requested by Bayer Cropscience June, 2011.

mRNA_recoded_by_codon_redefinition [SO_1001265]

A recoded_mRNA that was modified by an alteration of codon meaning.

mRNA_recoded_by_translational_bypass [SO_1001264]

A recoded_mRNA where translation was suspended at a particular codon and resumed at a particular non-overlapping downstream codon.

mRNA_region [SO_0000836]

A region of an mRNA. This term was added to provide a grouping term for the region parts of mRNA, thus giving them an is_a path back to the root.

mRNA_with_frameshift [SO_0000108]

An mRNA with a frameshift.

mRNA_with_minus_1_frameshift [SO_0000282]

An mRNA with a minus 1 frameshift.

mRNA_with_minus_2_frameshift [SO_0000335]

A mRNA with a minus 2 frameshift.

mRNA_with_plus_1_frameshift [SO_0000321]

An mRNA with a plus 1 frameshift.

mRNA_with_plus_2_frameshift [SO_0000329]

An mRNA with a plus 2 frameshift.

mt_rRNA [SO_0002128]

Mitochondrial rRNA is an RNA component of the small or large subunits of mitochondrial ribosomes. Updated definition to be consistent with format of other rRNA definitions on 10 June 2021. Requested by EBI. See GitHub Issue #493.

mt_tRNA [SO_0002129]

Mitochondrial transfer RNA.

MTE [SO_0001162]

A sequence element characteristic of some RNA polymerase II promoters, usually located between +20 and +30 relative to the TSS. Consensus sequence is CSARCSSAACGS. Tends to co-occur with INR motif (SO:0000014). Tends to not occur with DPE motif (SO:0000015) or DMv5 (SO:0001159).

multiplexing_sequence_identifier [SO_0002023]

A nucleic tag which is used in a ligation step of library preparation process to allow pooling of samples while maintaining ability to identify individual source material and creation of a multiplexed library.

mutated_variant_site [SO_0001148]

Site which has been experimentally altered. Discrete.

mutation_causing_inframe_polypeptide_N_terminal_elongation [SO_1000106]

OBSOLETE: This term was deleted as it conflated more than one term. The alteration is separate from the effect. [elongated_in_frame_polypeptide_N_terminal_elongation; term replaced by; inframe polypeptide N-terminal elongation; mutation_causing_inframe_polypeptide_N_terminal_elongation; mutation causing inframe polypeptide N terminal elongation]

mutation_causing_out_of_frame_polypeptide_C_terminal_elongation [SO_1000109]

OBSOLETE: This term was deleted as it conflated more than one term. The alteration is separate from the effect. [term replaced by; mutation_causing_out_of_frame_polypeptide_C_terminal_elongation; mutation causing out of frame polypeptide C terminal elongation; elongated_out_of_frame_polypeptide_C_terminal; out of frame polypeptide C-terminal elongation]

mutation_causing_out_of_frame_polypeptide_N_terminal_elongation [SO_1000107]

OBSOLETE: This term was deleted as it conflated more than one term. The alteration is separate from the effect. [term replaced by; out of frame polypeptide N-terminal elongation; mutation_causing_out_of_frame_polypeptide_N_terminal_elongation; mutation causing out of frame polypeptide N terminal elongation; elongated_out_of_frame_polypeptide_N_terminal]

mutation_causing_polypeptide_C_terminal_elongation [SO_1000101]

. OBSOLETE: This term was deleted as it conflated more than one term. The alteration is separate from the effect.

mutation_causing_polypeptide_N_terminal_elongation [SO_1000100]

. OBSOLETE: This term was deleted as it conflated more than one term. The alteration is separate from the effect.

mutaton_causing_inframe_polypeptide_C_terminal_elongation [SO_1000108]

[term replaced by; mutaton_causing_inframe_polypeptide_C_terminal_elongation; elongated_in_frame_polypeptide_C_terminal; inframe_polypeptide C-terminal elongation; mutaton causing inframe polypeptide C terminal elongation]

N_region [SO_0001835]

Extra nucleotides inserted between rearranged immunoglobulin segments.

n_terminal_region [SO_0100014]

The amino-terminal positively-charged region of a signal peptide (approx 1-5 aa).

N2_2_prime_O_dimethylguanosine [SO_0001329]

N2_2prime_O_dimethylguanosine is a modified guanosine base feature.

N2_7_2prirme_O_trimethylguanosine [SO_0001343]

N2_7_2prirme_O_trimethylguanosine is a modified guanosine base feature.

N2_7_dimethylguanosine [SO_0001338]

N2_7_dimethylguanosine is a modified guanosine base feature.

N2_methylguanosine [SO_0001325]

N2_methylguanosine is a modified guanosine base feature.

N2_N2_2_prime_O_trimethylguanosine [SO_0001330]

N2_N2_2prime_O_trimethylguanosine is a modified guanosine base feature.

N2_N2_7_trimethylguanosine [SO_0001339]

N2_N2_7_trimethylguanosine is a modified guanosine base feature.

N2_N2_dimethylguanosine [SO_0001328]

N2_N2_dimethylguanosine is a modified guanosine base feature.

N4_2_prime_O_dimethylcytidine [SO_0001291]

N4,2’-O-dimethylcytidine is a modified cytidine.

N4_acetyl_2_prime_O_methylcytidine [SO_0001288]

N4-acetyl-2’-O-methylcytidine is a modified cytidine.

N4_acetylcytidine [SO_0001285]

N4-acetylcytidine is a modified cytidine.

N4_methylcytidine [SO_0001290]

N4-methylcytidine is a modified cytidine.

N4_N4_2_prime_O_trimethylcytidine [SO_0001294]

N4_N4_2_prime_O_trimethylcytidine is a modified cytidine.

N6_2_prime_O_dimethyladenosine [SO_0001312]

N6_2prime_O_dimethyladenosine is a modified adenosine.

N6_acetyladenosine [SO_0001315]

N6_acetyladenosine is a modified adenosine.

N6_cis_hydroxyisopentenyl_adenosine [SO_0001302]

N6_cis_hydroxyisopentenyl_adenosine is a modified adenosine.

N6_glycinylcarbamoyladenosine [SO_0001304]

N6_glycinylcarbamoyladenosine is a modified adenosine.

N6_hydroxynorvalylcarbamoyladenosine [SO_0001308]

N6_hydroxynorvalylcarbamoyladenosine is a modified adenosine.

N6_isopentenyladenosine [SO_0001300]

N6_isopentenyladenosine is a modified adenosine.

N6_methyl_N6_threonylcarbamoyladenosine [SO_0001307]

N6_methyl_N6_threonylcarbamoyladenosine is a modified adenosine.

N6_methyladenine [SO_0001920]

An adenine methylated at the 6 nitrogen.

N6_methyladenosine [SO_0001297]

N6_methyladenosine is a modified adenosine.

N6_N6_2_prime_O_trimethyladenosine [SO_0001313]

N6_N6_2prime_O_trimethyladenosine is a modified adenosine.

N6_N6_dimethyladenosine [SO_0001311]

N6_N6_dimethyladenosine is a modified adenosine.

N6_threonylcarbamoyladenosine [SO_0001305]

N6_threonylcarbamoyladenosine is a modified adenosine.

natural [SO_0000782]

An attribute describing a feature that occurs in nature.

natural_plasmid [SO_0001476]

A plasmid that occurs naturally.

natural_transposable_element [SO_0000797]

TE that exists (or existed) in nature.

natural_variant_site [SO_0001147]

Describes the natural sequence variants due to polymorphisms, disease-associated mutations, RNA editing and variations between strains, isolates or cultivars. Discrete.

nc_conserved_region [SO_0000334]

Non-coding region of sequence similarity by descent from a common ancestor.

nc_primary_transcript [SO_0000483]

A primary transcript that is never translated into a protein.

ncRNA_gene [SO_0001263]

A gene that encodes a non-coding RNA.

NDM2_motif [SO_0001167]

A non directional promoter motif with consensus CGMYGYCR.

NDM3_motif [SO_0001168]

A non directional promoter motif with consensus sequence GAAAGCT.

negative_sense_ssRNA_viral_sequence [SO_0001200]

A negative_sense_RNA_viral_sequence is a ss_RNA_viral_sequence that is the sequence of a single stranded RNA virus that is complementary to mRNA and must be converted to positive sense RNA by RNA polymerase before translation.

negatively_autoregulated [SO_0000473]

The gene product is involved in its own transcriptional regulation where it decreases transcription.

negatively_autoregulated_gene [SO_0000891]

A gene that is negatively autoreguated.

nested_region [SO_0001051]

[nested_region]

nested_repeat [SO_0001052]

[nested_repeat]

nested_repeat [SO_0001649]

A repeat that is disrupted by the insertion of another element.

nested_tandem_repeat [SO_0001658]

An NTR is a nested repeat of two distinct tandem motifs interspersed with each other. Tracker ID: 3052459.

nested_transposon [SO_0001053]

[nested_transposon]

nested_transposon [SO_0001648]

A transposon that is disrupted by the insertion of another element.

NMD_polymorphic_pseudogene_transcript [SO_0002118]

A polymorphic pseudogene transcript that contains a CDS but has one or more splice junctions >50bp downstream of stop codon. Premature stop codon is not introduced, directly or indirectly, as a result of the variation i.e. must be present in both protein_coding and pseudogenic alleles. Term added as part of collaboration with Gencode, adding biotypes used in annotation.

NMD_transcript [SO_0002114]

A protein coding transcript that contains a CDS but has one or more splice junctions >50bp downstream of stop codon, making it susceptible to nonsense mediated decay. Term added as part of collaboration with Gencode, adding biotypes used in annotation.

NMD_transcript_variant [SO_0001621]

A variant in a transcript that is the target of nonsense-mediated mRNA decay.

no_output [SO_0100010]

An experimental region wherean analysis has been run and not produced any annotation.

no_sequence_alteration [SO_0002073]

A position or feature within a sequence that is identical to the comparable position or feature of a specified reference sequence. This term is requested by the ClinVar data model group for use in the allele registry and such. A sequence at a defined location that is defined to match the reference assembly.

non_adjacent_residues [SO_0001083]

Indicates that two consecutive residues in a fragment sequence are not consecutive in the full-length protein and that there are a number of unsequenced residues between them.

non_allelic_homologous_recombination_region [SO_0002094]

A genomic region at a non-allelic position where exchange of genetic material happens as a result of homologous recombination.

non_AUG_initiated_uORF [SO_0002151]

A uORF beginning with a codon other than AUG.

non_canonical_five_prime_splice_site [SO_0000679]

A 5’ splice site which does not have the sequence “GT”.

non_canonical_splice_site [SO_0000674]

A splice site where the donor and acceptor sites differ from the canonical form.

non_canonical_start_codon [SO_0000680]

A start codon that is not the usual AUG sequence.

non_canonical_three_prime_splice_site [SO_0000678]

A 3’ splice site that does not have the sequence “AG”.

non_capped_primary_transcript [SO_0000106]

[non_capped_primary_transcript]

non_coding_transcript_exon_variant [SO_0001792]

A sequence variant that changes non-coding exon sequence in a non-coding transcript.

non_coding_transcript_intron_variant [SO_0001970]

A transcript variant occurring within an intron of a non coding transcript.

non_coding_transcript_splice_region_variant [SO_0002088]

A transcript variant occurring within the splice region (1-3 bases of the exon or 3-8 bases of the intron) of a non coding transcript.

non_coding_transcript_variant [SO_0001619]

A transcript variant of a non coding RNA gene. Within non-coding gene - Located within a gene that does not code for a protein.

non_conservative_amino_acid_substitution [SO_0001608]

A sequence variant of a codon causing the substitution of a non conservative amino acid for another in the resulting polypeptide.

non_conservative_missense_variant [SO_0001586]

A sequence variant whereby at least one base of a codon is changed resulting in a codon that encodes for an amino acid with different biochemical properties.

non_covalent_binding_site [SO_0001091]

Binding site for any chemical group (co-enzyme, prosthetic group, etc.). Discrete.

non_cytoplasmic_polypeptide_region [SO_0001074]

Polypeptide region that is localized outside of a lipid bilayer and outside of the cytoplasm. This could be inside an organelle within the cell.

non_LTR_retrotransposon [SO_0000189]

A retrotransposon without long terminal repeat sequences.

non_LTR_retrotransposon_polymeric_tract [SO_0000433]

A polymeric tract, such as poly(dA), within a non_LTR_retrotransposon.

non_processed_pseudogene [SO_0001760]

A pseudogene that arose from a means other than retrotransposition. A pseudogene created via genomic duplication of a functional protein-coding parent gene followed by accumulation of deleterious mutations.

non_protein_coding [SO_0000011]

A gene which can be transcribed, but will not be translated into a protein.

non_synonymous [SO_0001816]

A variant that leads to the change of an amino acid within the protein.

non_terminal_residue [SO_0001084]

The residue at an extremity of the sequence is not the terminal residue. Discrete.

non_transcribed_region [SO_0000183]

A region of the gene which is not transcribed.

nonamer_of_recombination_feature_of_vertebrate_immune_system_gene [SO_0000562]

Nine nucleotide recombination site, part of V-gene, D-gene or J-gene recombination feature of an immunoglobulin or T-cell receptor gene.

noncoding_exon [SO_0000198]

An exon that does not contain any codons.

noncoding_region_of_exon [SO_0001214]

The maximal intersection of exon and UTR. An exon either containing but not starting with a start codon or containing but not ending with a stop codon will be partially coding and partially non coding.

noncontiguous_finished [SO_0001490]

The status of a whole genome sequence, where the assembly is high quality, closure approaches have been successful for most gaps, misassemblies and low quality regions.

nonsynonymous_variant [SO_0001992]

A non-synonymous variant is an inframe, protein altering variant, resulting in a codon change.

novel_sequence_insertion [SO_0001838]

An insertion the sequence of which cannot be mapped to the reference genome. Requested by the NCBI.

NSD_transcript [SO_0002130]

A transcript that contains a CDS but has no stop codon before the polyA site is reached.

nuclear_chromosome [SO_0000828]

A chromosome originating in a nucleus.

nuclear_export_signal [SO_0001531]

A polypeptide region that targets a polypeptide to he cytoplasm.

nuclear_localization_signal [SO_0001528]

A polypeptide region that targets a polypeptide to the nucleus.

nuclear_mitochondrial [SO_0000899]

An attribute describing a nuclear pseudogene of a mitochndrial gene.

nuclear_mt_pseudogene [SO_0001044]

A nuclear pseudogene of either coding or non-coding mitochondria derived sequence. Definition change requested by Val, 3172757.

nuclear_rim_localization_signal [SO_0001534]

A polypeptide region that targets a polypeptide to the nuclear rim.

nuclear_sequence [SO_0000738]

DNA belonging to the nuclear genome of cell. Moved from is_a SO:0000736 (organelle_sequence) when brought to our attention by GitHub issue #489.

nuclease_binding_site [SO_0000059]

A binding site that, of a nucleotide molecule, that interacts selectively and non-covalently with polypeptide residues of a nuclease.

nuclease_hypersensitive_site [SO_0000322]

A region of nucleotide sequence targeted by a nuclease enzyme that is found cleaved more than would be expected by chance. Relationship to accessible_DNA_region added 11 Feb 2021. GREEKC pointed out that this is an assay based term, but we need a biological term for the accessible DNA. See GitHub Issue #531.

nuclease_sensitive_site [SO_0000684]

A region of nucleotide sequence targeted by a nuclease enzyme.

nucleic_acid [SO_0000348]

An attribute describing a sequence consisting of nucleobases bound to repeating units. The forms found in nature are deoxyribonucleic acid (DNA), where the repeating units are 2-deoxy-D-ribose rings connected to a phosphate backbone, and ribonucleic acid (RNA), where the repeating units are D-ribose rings connected to a phosphate backbone.

nucleomorph_gene [SO_0000097]

A gene from nucleomorph sequence.

nucleomorphic_chromosome [SO_0000829]

A chromosome originating in a nucleomorph.

nucleomorphic_sequence [SO_0000739]

DNA belonging to the genome of a plastid such as a chloroplast. The nucleomorph is the nuclei of the plastic.

nucleotide_binding_site [SO_0001655]

A binding site that, in the molecule, interacts selectively and non-covalently with nucleotide residues. See GO:0000166 : nucleotide binding.

nucleotide_cleavage_site [SO_0002204]

A point in nucleic acid where a cleavage event occurs.

nucleotide_match [SO_0000347]

A match against a nucleotide sequence.

nucleotide_motif [SO_0000714]

A region of nucleotide sequence corresponding to a known motif.

nucleotide_to_protein_binding_site [SO_0001654]

A binding site that, in the nucleotide molecule, interacts selectively and non-covalently with polypeptide residues.

null_mutation [SO_0002055]

A variant whereby the gene product is not functional or the gene product is not produced.

obsolete assembly_component [SO_0000143]

A region of known length which may be used to manufacture a longer region.

obsolete contig [SO_0000149]

A contiguous sequence derived from sequence assembly. Has no gaps, but may contain N’s from unavailable bases.

octamer_motif [SO_0001258]

A sequence element characteristic of some RNA polymerase II promoters with sequence ATTGCAT that binds Pou-domain transcription factors. Nature. 1986 Oct 16-22;323(6089):640-3.

Okazaki_fragment [SO_0001985]

Any of the DNA segments produced by discontinuous synthesis of the lagging strand during DNA replication. Requested by Midori Harris, 2013.

oligo_U_tail [SO_0000609]

The string of non-encoded U’s at the 3’ end of a guide RNA (SO:0000602).

one_methyl_three_three_amino_three_carboxypropyl_pseudouridine [SO_0001373]

1_methyl_3_3_amino_3_carboxypropyl_pseudouridine is a modified uridine base feature.

one_methyladenosine [SO_0001295]

1_methyladenosine is a modified adenosine.

one_methylguanosine [SO_0001324]

1_methylguanosine is a modified guanosine base feature.

one_methylinosine [SO_0001278]

1-methylinosine is a modified inosine.

one_methylpseudouridine [SO_0001347]

1_methylpseudouridine is a modified uridine base feature.

one_two_prime_O_dimethyladenosine [SO_0001314]

1,2’-O-dimethyladenosine is a modified adenosine.

one_two_prime_O_dimethylguanosine [SO_0001340]

1_2prime_O_dimethylguanosine is a modified guanosine base feature.

one_two_prime_O_dimethylinosine [SO_0001279]

1,2’-O-dimethylinosine is a modified inosine.

open_chromatin_region [SO_0001747]

A DNA sequence that in the normal state of the chromosome corresponds to an unfolded, un-complexed stretch of double-stranded DNA. Requested by John Calley 3125900.

operon_member [SO_0000080]

A gene that is a member of an operon, which is a set of genes transcribed together as a unit.

ORF [SO_0000236]

The in-frame interval between the stop codons of a reading frame which when read as sequential triplets, has the potential of encoding a sequential string of amino acids. TER(NNN)nTER. The definition was modified by Rama. ORF is defined by the sequence, whereas the CDS is defined according to whether a polypeptide is made. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

organelle_sequence [SO_0000736]

A sequence of DNA that originates from a an organelle.

oriC [SO_0000953]

An origin of bacterial chromosome replication.

oriV [SO_0000952]

An origin of vegetative replication in plasmids and phages.

orphan [SO_0000910]

A gene whose predicted amino acid sequence is unsupported by any experimental evidence or by any match with any other known sequence.

orphan_CDS [SO_1001247]

A CDS whose predicted amino acid sequence is unsupported by any experimental evidence or by any match with any other known sequence.

orthologous [SO_0000858]

An attribute describing a kind of homology where divergence occurred after a speciation event.

outron [SO_0001475]

A region of a primary transcript, that is removed via trans splicing.

overlapping [SO_0000068]

An attribute describing a gene that has a sequence that overlaps the sequence of another gene.

overlapping_EST_set [SO_0001262]

A continous experimental result region extending the length of multiple overlapping EST’s.

overlapping_feature_set [SO_0001261]

A continuous region of sequence composed of the overlapping of multiple sequence_features, which ultimately provides evidence for another sequence_feature. This feature was requested by Nicole, tracker id 1911479. It is required to gather evidence together for annotation. An example would be overlapping ESTs that support an mRNA.

P_TIR_transposon [SO_0001535]

A P-element is a DNA transposon responsible for hybrid dysgenesis. P elements in this terminal inverted repeat (TIR) transposon superfamily have 31 bp perfect TIR and upon insertion duplicate an 8 bp sequence. It contains transposase that may lack the DDE domain. Moved from under DNA_transposon (SO:0000182) by Dave Sant as per request from GitHub issue #488 on June 25, 2020

PAC [SO_0000154]

The P1-derived artificial chromosome are DNA constructs that are derived from the DNA of P1 bacteriophage. They can carry large amounts (about 100-300 kilobases) of other sequences for a variety of bioengineering purposes. It is one type of vector used to clone DNA fragments (100- to 300-kb insert size; average, 150 kb) in Escherichia coli cells. This term is mapped to MGED. Do not obsolete without consulting MGED ontology. Drosophila melanogaster PACs carry an average insert size of 80 kb. The library represents a 6-fold coverage of the genome.

PAC_clone [SO_0000762]

[P1_clone; PAC_clone]

PAC_end [SO_0001480]

A region of sequence from the end of a PAC clone that may provide a highly specific marker.

paired_end_fragment [SO_0001790]

An assembly region that has been sequenced from both ends resulting in a read_pair (mate_pair).

paracentric [SO_0001519]

An inversion event that does not include the centromere.

paracentric_inversion [SO_1000047]

A chromosomal inversion that does not include the centromere.

parallel_beta_strand [SO_0001113]

A peptide region which hydrogen bonded to another region of peptide running in the oposite direction (both running N-terminal to C-terminal). This orientation is slightly less stable because it introduces nonplanarity in the inter-strand hydrogen bonding pattern. Hydrogen bonding occurs between every other C=O from one strand to every other N-H on the adjacent strand. In this case, if two atoms C-alpha (i)and C-alpha (j) are adjacent in two hydrogen-bonded beta strands, then they do not hydrogen bond to each other; rather, one residue forms hydrogen bonds to the residues that flank the other (but not vice versa). For example, residue i may form hydrogen bonds to residues j - 1 and j + 1; this is known as a wide pair of hydrogen bonds. By contrast, residue j may hydrogen-bond to different residues altogether, or to none at all. The dihedral angles (phi, psi) are about (-120 degrees, 115 degrees) in parallel sheets. Range.

paralogous [SO_0000859]

An attribute describing a kind of homology where divergence occurred after a duplication event.

partial_genomic_sequence_assembly [SO_0001876]

A partial DNA sequence assembly of a chromosome or full genome, which contains gaps that are filled with N’s. Requested by Bayer Cropscience January, 2012.

partially_characterized_chromosomal_mutation [SO_1000175]

A chromosome structure variant that has not been characterized fully.

partially_processed_cDNA_clone [SO_0000813]

A cDNA invalidated clone by partial processing.

paternal_uniparental_disomy [SO_0001746]

Uniparental disomy is a sequence_alteration where a diploid individual receives two copies for all or part of a chromosome from the father and no copies of the same chromosome or region from the mother.

paternal_variant [SO_0001776]

A variant in the genetic material inherited from the father.

paternally_imprinted [SO_0000136]

The paternal copy of the gene is modified, rendering it transcriptionally silent.

paternally_imprinted_gene [SO_0000889]

A gene that is paternally imprinted.

pathogenic_island [SO_0000773]

Mobile genetic elements that contribute to rapid changes in virulence potential. They are present on the genomes of pathogenic strains but absent from the genomes of non pathogenic members of the same or related species. Nature Reviews Microbiology 2, 414-424 (2004); doi:10.1038 micro 884 GENOMIC ISLANDS IN PATHOGENIC AND ENVIRONMENTAL MICROORGANISMS Ulrich Dobrindt, Bianca Hochhut, Ute Hentschel & Jorg Hacker.

PCB [SO_0001871]

A promoter element with consensus sequence GNAACR, bound by the transcription factor complex PBF (PCB-binding factor) and found in promoters of genes expressed during the M/G1 transition of the cell cycle.

PCR_product [SO_0000006]

A region amplified by a PCR reaction. This term is mapped to MGED. This term is now located in OBI, with the following ID OBI_0000406.

pedigree_specific_variant [SO_0001779]

A variant that is found only by individuals that belong to the same pedigree.

peptide_coil [SO_0100012]

Irregular, unstructured regions of a protein’s backbone, as distinct from the regular region (namely alpha helix and beta strand - characterised by specific patterns of main-chain hydrogen bonds).

peptide_collection [SO_0001501]

A collection of peptide sequences. Term requested via tracker ID: 2910829.

peptide_helix [SO_0001114]

A helix is a secondary_structure conformation where the peptide backbone forms a coil. Range.

peptide_localization_signal [SO_0001527]

A region of peptide sequence used to target the polypeptide molecule to a specific organelle.

peptidyl [SO_0001407]

An attribute describing the nature of a proteinaceous polymer, where by the amino acid units are joined by peptide bonds.

pericentric [SO_0001518]

An inversion event that includes the centromere.

pericentric_inversion [SO_1000046]

A chromosomal inversion that includes the centromere.

peroxywybutosine [SO_0001333]

Peroxywybutosine is a modified guanosine base feature.

Phage_RNA_Polymerase_Promoter [SO_0001204]

A region (DNA) to which Bacteriophage RNA polymerase binds, to begin transcription. former parent RNA_polymerase_promoter SO:0001203 was merged with promoter SO:0000167 in Aug 2020 as part of GREEKC.

phagemid_clone [SO_0000761]

[phagemid_clone]

phenylalanine [SO_0001441]

A non-polar, hydorophobic amino acid encoded by the codons TTT and TTC. A place holder for a cross product with chebi.

phenylalanine_tRNA_primary_transcript [SO_0000224]

A primary transcript encoding phenylalanyl tRNA (SO:0000267).

phenylalanyl_tRNA [SO_0000267]

A tRNA sequence that has a phenylalanine anticodon, and a 3’ phenylalanine binding region.

pheromone_response_element [SO_0002045]

A PRE is a (yeast) TFBS with consensus site [TGAAAC(A/G)]. Requested by Rama, SGD.

phosphorylation_site [SO_0001811]

A post-translationally modified region in which residues of the protein are modified by phosphorylation.

PIP_box [SO_0001810]

A polypeptide region that mediates binding to PCNA. The consensus sequence is QXX(hh)XX(aa), where (h) denotes residues with moderately hydrophobic side chains and (a) denotes residues with highly hydrophobic aromatic side chains.

piRNA_gene [SO_0001638]

A gene that encodes for an piwi associated RNA. Moved from ncRNA_gene to sncRNA_gene 27 April 2021 to be more consistent with the organization of the ncRNA branch of SO. Requested by FlyBase, moved by Dave Sant. See GitHub Issue #514.

plasmid [SO_0000155]

A self replicating, using the hosts cellular machinery, often circular nucleic acid molecule that is distinct from a chromosome in the organism. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

plasmid_clone [SO_0000759]

[plasmid_clone]

plasmid_gene [SO_0000098]

A gene from plasmid sequence.

plasmid_location [SO_0000749]

The location of DNA that has come from a plasmid sequence.

plastid_gene [SO_0000090]

A gene from plastid sequence.

plastid_sequence [SO_0000740]

DNA belonging to the genome of a plastid such as a chloroplast.

plus_1_frameshift [SO_0000868]

A frameshift caused by inserting one base.

plus_1_frameshift_variant [SO_0001594]

A sequence variant which causes a disruption of the translational reading frame, by shifting one base backward.

plus_1_translational_frameshift [SO_0001211]

The region of mRNA 1 base long that is skipped during the process of translational frameshifting (GO:0006452), causing the reading frame to be different.

plus_1_translationally_frameshifted [SO_1001263]

An attribute describing a translational frameshift of +1.

plus_2_frameshift_variant [SO_0001595]

A sequence variant which causes a disruption of the translational reading frame, by shifting two bases backward.

plus_2_framshift [SO_0000869]

A frameshift caused by inserting two bases.

plus_2_translational_frameshift [SO_0001212]

The region of mRNA 2 bases long that is skipped during the process of translational frameshifting (GO:0006452), causing the reading frame to be different.

PNA [SO_0001184]

An attribute describing a sequence composed of peptide nucleic acid (CHEBI:48021), a chemical consisting of nucleobases bound to a backbone composed of repeating N-(2-aminoethyl)-glycine units linked by peptide bonds. The purine and pyrimidine bases are linked to the backbone by methylene carbonyl bonds. Do not use this term for feature annotation. Use PNA_oligo (SO:0001011) instead.

PNA_oligo [SO_0001011]

Peptide nucleic acid, is a chemical not known to occur naturally but is artificially synthesized and used in some biological research and medical treatments. The PNA backbone is composed of repeating N-(2-aminoethyl)-glycine units linked by peptide bonds. The purine and pyrimidine bases are linked to the backbone by methylene carbonyl bonds.

point_centromere [SO_0001794]

A point centromere is a relatively small centromere (about 125 bp DNA) in discrete sequence, found in some yeast including S. cerevisiae.

point_mutation [SO_1000008]

A single nucleotide change which has occurred at the same position of a corresponding nucleotide in a reference sequence.

polinton [SO_0001170]

A kind of DNA transposon that populates the genomes of protists, fungi, and animals, characterized by a unique set of proteins necessary for their transposition, including a protein-primed DNA polymerase B, retroviral integrase, cysteine protease, and ATPase. Polintons are characterized by 6-bp target site duplications, terminal-inverted repeats that are several hundred nucleotides long, and 5’-AG and TC-3’ termini. Polintons exist as autonomous and nonautonomous elements.

polyA_primed_cDNA_clone [SO_0000812]

A cDNA clone invalidated by polyA priming.

polyA_sequence [SO_0000610]

Sequence of about 100 nucleotides of A added to the 3’ end of most eukaryotic mRNAs.

polyA_signal_sequence [SO_0000551]

The recognition sequence necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA. Moved from transcription_regulatory_region (SO:0001679) to transcriptional_cis_regulatory_region (SO:0001055) by Dave Sant on Feb 11, 2021 when transcription_regulatory_region was merged into transcriptional_cis_regulatory_region to be consistent with GO and reduce redundancy as part of the GREEKC consortium. See GitHub Issue #527.

polyA_site [SO_0000553]

The site on an RNA transcript to which will be added adenine residues by post-transcriptional polyadenylation. The boundary between the UTR and the polyA sequence.

polyadenylated [SO_0000246]

A attribute describing the addition of a poly A tail to the 3’ end of a mRNA molecule.

polyadenylated_mRNA [SO_0000871]

An mRNA that is polyadenylated.

polyadenylation_variant [SO_0001545]

A sequence variant that changes polyadenylation with respect to a reference sequence.

polycistronic [SO_0000880]

An attribute describing a sequence that contains the code for more than one gene product.

polycistronic_gene [SO_0000477]

A gene that is polycistronic.

polycistronic_primary_transcript [SO_0000631]

A primary transcript encoding for more than one gene product.

polycistronic_transcript [SO_0000078]

A transcript that is polycistronic.

polymer_attribute [SO_0000443]

An attribute to describe the kind of biological sequence.

polymerase_synthesis_read [SO_0001426]

A read produced by the polymerase based sequence by synthesis method. An example is a read produced by Illumina technology.

polymorphic_pseudogene [SO_0001841]

A pseudogene in the reference genome, though known to be intact in the genomes of other individuals of the same species. The annotation process has confirmed that the pseudogenisation event is not a genomic sequencing error. This terms is used by Ensembl and Vega. Pseudogene owing to a SNP/DIP but in other individuals/haplotypes/strains the gene is translated.

polymorphic_pseudogene_processed_transcript [SO_0002116]

A processed transcript that does not contain a CDS that fullfills annotation criteria and not necessarily functionally non-coding. Term added as part of collaboration with Gencode, adding biotypes used in annotation.

polymorphic_pseudogene_with_retained_intron [SO_0002110]

A polymorphic pseudogene in the reference genome, containing a retained intron, known to be intact in the genomes of other individuals of the same species. The annotation process has confirmed that the pseudogenisation event is not a genomic sequencing error. Term added as part of collaboration with Gencode, adding biotypes used in annotation.

polymorphic_sequence_variant [SO_0001025]

A sequence variant that is segregating in one or more natural populations of a species.

polymorphic_variant [SO_0001766]

A variant that affects one of several possible alleles at that location, such as the major histocompatibility complex (MHC) genes.

polypeptide [SO_0000104]

A sequence of amino acids linked by peptide bonds which may lack appreciable tertiary structure and may not be liable to irreversible denaturation. This term is mapped to MGED. Do not obsolete without consulting MGED ontology. The term ‘protein’ was merged with ‘polypeptide’. Although ‘protein’ was a sequence_attribute and therefore meant to describe the quality rather than an actual feature, it was being used erroneously. It is replaced by ‘peptidyl’ as the polymer attribute.

polypeptide_binding_motif [SO_0100018]

A polypeptide binding motif is a short (up to 20 amino acids) polypeptide region of biological interest that contains one or more amino acids experimentally shown to bind to a ligand.

polypeptide_calcium_ion_contact_site [SO_0001094]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with calcium ions. Residue involved in contact with calcium.

polypeptide_catalytic_motif [SO_0100019]

A polypeptide catalytic motif is a short (up to 20 amino acids) polypeptide region that contains one or more active site residues.

polypeptide_cobalt_ion_contact_site [SO_0001095]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with cobalt ions.

polypeptide_conserved_motif [SO_0100017]

A conserved motif is a short (up to 20 amino acids) region of biological interest that is conserved in different proteins. They may or may not have functional or structural significance within the proteins in which they are found.

polypeptide_conserved_region [SO_0100021]

A subsection of sequence with biological interest that is conserved in different proteins. They may or may not have functional or structural significance within the proteins in which they are found.

polypeptide_copper_ion_contact_site [SO_0001096]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with copper ions.

polypeptide_DNA_contact [SO_0100020]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with DNA.

polypeptide_domain [SO_0000417]

A structurally or functionally defined protein region. In proteins with multiple domains, the combination of the domains determines the function of the protein. A region which has been shown to recur throughout evolution. Range. Old definition from before biosapiens: A region of a single polypeptide chain that folds into an independent unit and exhibits biological activity. A polypeptide chain may have multiple domains.

polypeptide_function_variant [SO_0001554]

A sequence variant which changes polypeptide functioning with respect to a reference sequence.

polypeptide_fusion [SO_0001616]

A sequence variant that causes a fusion of two polypeptide sequences.

polypeptide_gain_of_function_variant [SO_0001557]

A sequence variant which causes gain of polypeptide function with respect to a reference sequence.

polypeptide_iron_ion_contact_site [SO_0001097]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with iron ions.

polypeptide_ligand_contact [SO_0001105]

Residues which interact with a ligand.

polypeptide_localization_variant [SO_0001558]

A sequence variant which changes the localization of a polypeptide with respect to a reference sequence.

polypeptide_loss_of_function_variant [SO_0001559]

A sequence variant that causes the loss of a polypeptide function with respect to a reference sequence.

polypeptide_magnesium_ion_contact_site [SO_0001098]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with magnesium ions.

polypeptide_manganese_ion_contact_site [SO_0001099]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with manganese ions.

polypeptide_metal_contact [SO_0001092]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with metal ions. Residue is part of a binding site for a metal ion.

polypeptide_molybdenum_ion_contact_site [SO_0001100]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with molybdenum ions.

polypeptide_motif [SO_0001067]

A sequence motif is a short (up to 20 amino acids) region of biological interest. Such motifs, although they are too short to constitute functional domains, share sequence similarities and are conserved in different proteins. They display a common function (protein-binding, subcellular location etc.). Range.

polypeptide_nest_left_right_motif [SO_0001121]

A motif of two consecutive residues with dihedral angles: Residue(i): +20 degrees < phi < +140 degrees, -40 degrees < psi < +90 degrees. Residue(i+1): -140 degrees < phi < -20 degrees, -90 degrees < psi < +40 degrees.

polypeptide_nest_motif [SO_0001120]

A motif of two consecutive residues with dihedral angles. Nest should not have Proline as any residue. Nests frequently occur as parts of other motifs such as Schellman loops.

polypeptide_nest_right_left_motif [SO_0001122]

A motif of two consecutive residues with dihedral angles: Residue(i): -140 degrees < phi < -20 degrees, -90 degrees < psi < +40 degrees. Residue(i+1): +20 degrees < phi < +140 degrees, -40 degrees < psi < +90 degrees.

polypeptide_nickel_ion_contact_site [SO_0001101]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with nickel ions.

polypeptide_partial_loss_of_function [SO_0001561]

A sequence variant that causes some but not all loss of polypeptide function with respect to a reference sequence.

polypeptide_post_translational_processing_affected [SO_1000123]

[polypeptide_post-translational_processing_affected; polypeptide_post_translational_processing_affected]

polypeptide_post_translational_processing_variant [SO_0001562]

A sequence variant that causes a change in post translational processing of the peptide with respect to a reference sequence.

polypeptide_region [SO_0000839]

Biological sequence region that can be assigned to a specific subsequence of a polypeptide. Added to allow the polypeptide regions to have is_a paths back to the root.

polypeptide_repeat [SO_0001068]

A polypeptide_repeat is a single copy of an internal sequence repetition. Range.

polypeptide_secondary_structure [SO_0001078]

A region of peptide with secondary structure has hydrogen bonding along the peptide chain that causes a defined conformation of the chain. Biosapien term was secondary_structure.

polypeptide_sequence_variant [SO_0001603]

A sequence variant with in the CDS that causes a change in the resulting polypeptide sequence.

polypeptide_sequencing_information [SO_0001082]

Incompatibility in the sequence due to some experimental problem. Range.

polypeptide_structural_region [SO_0001070]

Region of polypeptide with a given structural property. Range.

polypeptide_truncation [SO_0001617]

A sequence variant of the CD that causes a truncation of the resulting polypeptide.

polypeptide_tungsten_ion_contact_site [SO_0001102]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with tungsten ions.

polypeptide_turn_motif [SO_0001128]

A reversal in the direction of the backbone of a protein that is stabilized by hydrogen bond between backbone NH and CO groups, involving no more than 4 amino acid residues. Range.

polypeptide_variation_site [SO_0001146]

A site of sequence variation (alteration). Alternative sequence due to naturally occurring events such as polymorphisms and alternative splicing or experimental methods such as site directed mutagenesis. For example, was a substitution natural or mutated as part of an experiment? This term is added to merge the biosapiens term sequence_variations.

polypeptide_zinc_ion_contact_site [SO_0001103]

A binding site that, in the polypeptide molecule, interacts selectively and non-covalently with zinc ions.

polypyrimidine_tract [SO_0000612]

The polypyrimidine tract is one of the cis-acting sequence elements directing intron removal in pre-mRNA splicing.

population_specific_variant [SO_0001780]

A variant found within only speficic populations.

positional_candidate_gene [SO_0001868]

A candidate gene whose association with a trait is based on the gene’s location on a chromosome. Requested by Bayer Cropscience December, 2011.

positive_sense_ssRNA_viral_sequence [SO_0001201]

A positive_sense_RNA_viral_sequence is a ss_RNA_viral_sequence that is the sequence of a single stranded RNA virus that can be immediately translated by the host.

positively_autoregulated [SO_0000475]

The gene product is involved in its own transcriptional regulation, where it increases transcription.

positively_autoregulated_gene [SO_0000892]

A gene that is positively autoregulated.

possible_assembly_error [SO_0000702]

A region of sequence where there may have been an error in the assembly.

possible_base_call_error [SO_0000701]

A region of sequence where the validity of the base calling is questionable.

post_translationally_regulated [SO_0000130]

An attribute describing a gene that is regulated after it has been translated.

post_translationally_regulated_by_protein_modification [SO_0000469]

An attribute describing a gene sequence where the resulting protein is modified to regulate it.

post_translationally_regulated_by_protein_stability [SO_0000467]

An attribute describing a gene sequence where the resulting protein is regulated by the stability of the resulting protein.

post_translationally_regulated_gene [SO_0000890]

A gene that is post translationally regulated.

pre_edited_mRNA [SO_0000932]

A primary transcript that, at least in part, encodes one or more proteins that has not been edited.

pre_edited_region [SO_0000583]

The region of a transcript that will be edited.

pre_miRNA [SO_0001244]

The 60-70 nucleotide region remain after Drosha processing of the primary transcript, that folds back upon itself to form a hairpin structure.

predicted_ab_initio_computation [SO_0000310]

[predicted_ab_initio_computation]

predicted_by_ab_initio_computation [SO_0000911]

An attribute describing a feature that is predicted by a computer program that did not rely on sequence similarity.

predicted_gene [SO_0000996]

A region of the genome that has been predicted to be a gene but has not been confirmed by laboratory experiments. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

predicted_transcript [SO_0002138]

A transcript feature that has been predicted but is not yet validated.

primary_transcript [SO_0000185]

A transcript that in its initial state requires modification to be functional.

primary_transcript_attribute [SO_0000144]

[primary_transcript_attribute]

primary_transcript_region [SO_0000835]

A part of a primary transcript. This term was added to provide a grouping term for the region parts of primary_transcript, thus giving them an is_a path back to the root.

primer [SO_0000112]

An oligo to which new deoxyribonucleotides can be added by DNA polymerase.

primer_match [SO_0001472]

A nucleotide match to a primer sequence.

priRNA [SO_0002022]

A small RNA molecule, 22-23 nt in size, that is the product of a longer RNA. The production of priRNAs is independent of dicer and involves binding of RNA by argonaute and trimming by triman. In fission yeast, priRNAs trigger the establishment of heterochromatin. PriRNAs are primarily generated from centromeric transcripts (dg and dh repeats), but may also be produced from degradation products of primary transcripts.

processed [SO_0000900]

An attribute describing a pseudogene where by an mRNA was retrotransposed. The mRNA sequence is transcribed back into the genome, lacking introns and promotors, but often including a polyA tail.

processed_pseudogene [SO_0000043]

A pseudogene created via retrotranposition of the mRNA of a functional protein-coding parent gene followed by accumulation of deleterious mutations lacking introns and promoters, often including a polyA tail. Please not the synonym R psi M uses the spelled out form of the greek letter.

processed_transcript [SO_0001503]

A transcript for which no open reading frame has been identified and for which no other function has been determined. Ensembl and Vega also use this term name. Requested by Howard Deen of MGI.

processed_transcript_attribute [SO_0000082]

[processed_transcript_attribute]

prokaryotic_promoter [SO_0002222]

A regulatory_region essential for the specific initiation of transcription at a defined location in a DNA molecule, although this location might not be one single base. It is recognized by a specific RNA polymerase(RNAP)-holoenzyme, and this recognition is not necessarily autonomous.

proline [SO_0001439]

A non-polar, hydorophobic amino acid encoded by the codons CCN (CCT, CCC, CCA and CCG). A place holder for a cross product with chebi.

proline_tRNA_primary_transcript [SO_0000225]

A primary transcript encoding prolyl tRNA (SO:0000268).

prolyl_tRNA [SO_0000268]

A tRNA sequence that has a proline anticodon, and a 3’ proline binding region.

promoter_element [SO_0001659]

An element that can exist within the promoter region of a gene. Mmoved from is_a: SO:0001055 transcriptional_cis_regulatory_region as per request from GREEKC initiative in August 2020.

promoter_flanking_region [SO_0001952]

A region immediately adjacent to a promoter which may or may not contain transcription factor binding sites.

promoter_region [SO_0000832]

A region of sequence which is part of a promoter. This is a manufactured term to allow the parts of promoter to have an is_a path back to the root.

promoter_targeting_sequence [SO_0001058]

A transcriptional_cis_regulatory_region that restricts the activity of a CRM to a single promoter and which functions only when both itself and an insulator are located between the CRM and the promoter. Obsoleted Jan 21, 2021 by Dave Sant. GREEKC consortium individuals pointed out that this did not fit with the other child terms of transcriptional_cis_regulatory_region (SO:0001055), which are currently promoter, CRM and promoter flanking region. No comments about when this term was created exist, no references are listed. GREEKC members assume that this was previously under enhansosome (SO:0001057), which was probably created along with this term but has since been obsoleted. This term can be resurrected as non-obsolete if we can find a reference publication and/or change the name to a term that is commonly used in the field.

promoter_trap_construct [SO_0001478]

A construct which is designed to integrate into a genome and express a reporter when inserted in close proximity to a promoter element. Promoter traps typically do not contain promoter elements and are mutagenic.

propeptide [SO_0001062]

Part of a peptide chain which is cleaved off during the formation of the mature protein. Range.

propeptide_cleavage_site [SO_0001061]

The propeptide_cleavage_site is the arginine/lysine boundary on a propeptide where cleavage occurs. Discrete.

propeptide_region_of_CDS [SO_0002250]

A CDS region corresponding to a propeptide of a polypeptide. Added as per request from GitHub Issue #484 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/484)

prophage [SO_0001006]

A phage genome after it has established in the host genome in a latent/immune state either as a plasmid or as an integrated “island”.

proplastid_gene [SO_0000096]

A gene from proplastid sequence.

proplastid_sequence [SO_0000748]

DNA belonging to the genome of a proplastid such as an immature chloroplast.

protease_site [SO_0001956]

A polypeptide_region that codes for a protease cleavage site.

protein_altering_variant [SO_0001818]

A sequence_variant which is predicted to change the protein encoded in the coding sequence.

protein_binding_site [SO_0000410]

A binding site that, in the molecule, interacts selectively and non-covalently with polypeptide molecules. See GO:0042277 : peptide binding.

protein_coding [SO_0000010]

A gene which, when transcribed, can be translated into a protein.

protein_coding_gene [SO_0001217]

A gene that codes for an RNA that can be translated into a protein.

protein_coding_primary_transcript [SO_0000120]

A primary transcript that, at least in part, encodes one or more proteins. May contain introns.

protein_hmm_match [SO_0001831]

A match to a protein HMM such as pfam.

protein_match [SO_0000349]

A match against a protein sequence.

protein_protein_contact [SO_0001093]

A binding site that, in the protein molecule, interacts selectively and non-covalently with polypeptide residues.

protein_stability_element [SO_0001955]

A polypeptide region that proves structure in a protein that affects the stability of the protein.

proviral_gene [SO_0000099]

A gene from proviral sequence.

proviral_location [SO_0000751]

The location of DNA that has come from a viral origin.

proviral_region [SO_0000113]

A viral sequence which has integrated into a host genome.

proximal_promoter_element [SO_0001668]

DNA segment that ranges from about -250 to -40 relative to +1 of RNA transcription start site, where sequence specific DNA-binding transcription factors binds, such as Sp1, CTF (CCAAT-binding transcription factor), and CBF (CCAAT-box binding factor).

PSE_motif [SO_0000017]

A sequence element characteristic of the promoters of snRNA genes transcribed by RNA polymerase II or by RNA polymerase III. Located between -45 and -60 relative to the TSS. The human PSE_motif consensus sequence is TCACCNTNA(C|G)TNAAAAG(T|G). The basal transcription factor, snRNA-activating protein complex (SNAPc), binds the PSE_motif and is required for the transcription of both RNA polymerase II and III transcribed small-nuclear RNA genes.

pseudogene [SO_0000336]

A sequence that closely resembles a known functional gene, at another locus within a genome, that is non-functional as a consequence of (usually several) mutations that prevent either its transcription or translation (or both). In general, pseudogenes result from either reverse transcription of a transcript of their “normal” paralog (SO:0000043) (in which case the pseudogene typically lacks introns and includes a poly(A) tail) or from recombination (SO:0000044) (in which case the pseudogene is typically a tandem duplication of its “normal” paralog).

pseudogene_attribute [SO_0000042]

An attribute of a pseudogene (SO:0000336).

pseudogene_by_unequal_crossing_over [SO_0000044]

A pseudogene caused by unequal crossing over at recombination.

pseudogene_processed_transcript [SO_0002111]

A processed_transcript supported by EST and/or mRNA evidence that aligns unambiguously to a pseudogene locus (i.e. alignment to the pseudogene locus clearly better than alignment to parent locus). Term added as part of collaboration with Gencode, adding biotypes used in annotation.

pseudogenic_CDS [SO_0002087]

A non functional descendant of the coding portion of a coding transcript, part of a pseudogene.

pseudogenic_exon [SO_0000507]

A non functional descendant of an exon, part of a pseudogene. This is the analog of the exon of a functional gene. The term was requested by Rama - SGD to allow the annotation of the parts of a pseudogene. Non-functional is defined as either its transcription or translation (or both) are prevented due to one or more mutations.

pseudogenic_gene_segment [SO_0001741]

A gene segment which when incorporated by somatic recombination in the final gene transcript results in a nonfunctional product.

pseudogenic_region [SO_0000462]

A non-functional descendant of a functional entity.

pseudogenic_rRNA [SO_0000777]

A non functional descendant of an rRNA. Added Jan 2006 to allow the annotation of the pseudogenic rRNA by flybase. Non-functional is defined as its transcription is prevented due to one or more mutatations.

pseudogenic_transcript [SO_0000516]

A non functional descendant of a transcript, part of a pseudogene. This is the analog of the transcript of a functional gene. The term was requested by Rama - SGD to allow the annotation of the parts of a pseudogene. Non-functional is defined as either its transcription or translation (or both) are prevented due to one or more mutations.

pseudogenic_transcript_with_retained_intron [SO_0002115]

A transcript supported by EST and/or mRNA evidence that aligns unambiguously to the pseudogene locus; has retained intronic sequence compared to a reference transcript sequence. These terms have been requested by Adam Frankish to support Gencode and Vega biotypes.

pseudogenic_tRNA [SO_0000778]

A non functional descendent of a tRNA. Added Jan 2006 to allow the annotation of the pseudogenic tRNA by flybase. Non-functional is defined as its transcription is prevented due to one or more mutatations.

pseudoknot [SO_0000591]

A tertiary structure in RNA where nucleotides in a loop form base pairs with a region of RNA downstream of the loop.

pseudouridylation_guide_snoRNA [SO_0001187]

A snoRNA that specifies the site of pseudouridylation in an RNA molecule by base pairing with a short sequence around the target residue. Has RNA pseudouridylation guide activity (GO:0030558).

purine_to_pyrimidine_transversion [SO_1000023]

Change of a purine nucleotide, A or G , into a pyrimidine nucleotide C or T.

purine_transition [SO_1000014]

A substitution of a purine, A or G, for another purine.

pyrimidine_to_purine_transversion [SO_1000018]

Change of a pyrimidine nucleotide, C or T, into a purine nucleotide, A or G.

pyrimidine_transition [SO_1000010]

A substitution of a pyrimidine, C or T, for another pyrimidine.

pyrosequenced_read [SO_0001424]

A read produced by pyrosequencing technology. An example is a read produced by Roche 454 technology.

pyrrolysine [SO_0001456]

A relatively rare amino acid encoded by the codon UAG in some contexts, whereas UAG is a termination codon in other contexts. A place holder for a cross product with chebi.

pyrrolysine_loss [SO_0002010]

A sequence variant whereby at least one base of a codon encoding pyrrolysine is changed, resulting in a different encoded amino acid. Request from Uma Devi Paila, UVA. Variants in the sites of rare amino acids e.g. Selenocysteine. These are important impact terms since a loss of such rare amino acids may lead to a loss of function.

pyrrolysine_tRNA_primary_transcript [SO_0001178]

A primary transcript encoding pyrrolysyl tRNA (SO:0000766).

pyrrolysyl_tRNA [SO_0000766]

A tRNA sequence that has a pyrrolysine anticodon, and a 3’ pyrrolysine binding region.

QTL [SO_0000771]

A quantitative trait locus (QTL) is a polymorphic locus which contains alleles that differentially affect the expression of a continuously distributed phenotypic trait. Usually it is a marker described by statistical association to quantitative variation in the particular phenotypic trait that is thought to be controlled by the cumulative action of alleles at multiple loci. Added in respose to request by Simon Twigger November 14th 2005.

quality_value [SO_0001686]

An experimental feature attribute that defines the quality of the feature in a quantitative way, such as a phred quality score.

quantitative_variant [SO_0001774]

A variant within a gene that contributes to a quantitative trait such as height or weight.

queuosine [SO_0001317]

Queuosine is a modified 7-deazoguanosine.

R_five_prime_LTR_region [SO_0000427]

The R segment of the three-prime long terminal repeat.

R_GNA [SO_0001194]

An attribute describing a GNA sequence in the (R)-GNA enantiomer. Do not use this term for feature annotation. Use R_GNA_oligo (SO:0001195) instead.

R_GNA_oligo [SO_0001195]

An oligo composed of (R)-GNA residues.

R_LTR_region [SO_0000423]

The R segment of the long terminal repeats.

R_three_prime_LTR_region [SO_0000430]

The R segment of the three-prime long terminal repeat.

random_sequence [SO_0000449]

A sequence of nucleotides or amino acids which, by design, has a “random” order of components, given a predetermined input frequency of these components.

RAPD [SO_0001481]

RAPD is a ‘PCR product’ where a sequence variant is identified through the use of PCR with random primers.

rare_amino_acid_variant [SO_0002008]

A sequence variant whereby at least one base of a codon encoding a rare amino acid is changed, resulting in a different encoded amino acid. Request from Uma Devi Paila, UVA. Variants in the sites of rare amino acids e.g. Selenocysteine. These are important impact terms since a loss of such rare amino acids may lead to a loss of function.

rare_variant [SO_0001765]

When a variant from the genomic sequence is rarely found in the general population. The threshold for ‘rare’ varies between studies.

rasiRNA [SO_0000454]

A 17-28-nt, small interfering RNA derived from transcripts of repetitive elements.

rate_of_transcription_variant [SO_0001550]

A sequence variant that changes the rate of transcription with respect to a reference sequence.

rDNA_intergenic_spacer_element [SO_0001860]

A DNA motif that contains a core consensus sequence AGGTAAGGGTAATGCAC, is found in the intergenic regions of rDNA repeats, and is bound by an RNA polymerase I transcription termination factor (e.g. S. pombe Reb1). The S. pombe telomeric repeat consensus is TTAC(0-1)A(0-1)G(1-8). Page 208 of ISBN:978-0199638901

rDNA_replication_fork_barrier [SO_0001914]

A DNA motif that is found in eukaryotic rDNA repeats, and is a site of replication fork pausing. Requested by Midori - June 2012.

read [SO_0000150]

A sequence obtained from a single sequencing experiment. Typically a read is produced when a base calling program interprets information from a chromatogram trace file produced from a sequencing machine.

read_pair [SO_0000007]

One of a pair of sequencing reads in which the two members of the pair are related by originating at either end of a clone insert.

reading_frame [SO_0000717]

A nucleic acid sequence that when read as sequential triplets, has the potential of encoding a sequential string of amino acids. It need not contain the start or stop codon. This term was added after a request by SGD. August 2004. Modified after SO meeting in Cambridge to not include start or stop.

reagent [SO_0000695]

A sequence used in experiment. Requested by Lynn Crosby, jan 2006.

reagent_attribute [SO_0000786]

Added jan 2006 by KE. [reagent attribute; reagent_attribute]

rearranged_at_DNA_level [SO_0000904]

An attribute to describe the sequence of a feature, where the DNA is rearranged.

rearrangement_region [SO_0001872]

A region of a chromosome, where the chromosome has undergone a large structural rearrangement that altered the genome organization. There is no longer synteny to the reference genome. NCBI definition: An orphan rearrangement between chromosomal location observed in isolation.

reciprocal [SO_0001521]

When translocation occurs between nonhomologous chromosomes and involved an equal exchange of genetic materials.

reciprocal_chromosomal_translocation [SO_1000048]

A chromosomal translocation with two breaks; two chromosome segments have simply been exchanged.

recoded [SO_0000881]

An attribute describing an mRNA sequence that has been reprogrammed at translation, causing localized alterations.

recoded_by_translational_bypass [SO_0000886]

Recoded mRNA where a block of nucleotides is not translated.

recoded_codon [SO_0000145]

A codon that has been redefined at translation. The redefinition may be as a result of translational bypass, translational frameshifting or stop codon readthrough.

recoded_mRNA [SO_1001261]

The sequence of a mature mRNA transcript, modified before translation or during translation, usually by special cis-acting signals.

recoding_pseudoknot [SO_0000545]

The pseudoknots involved in recoding are unique in that, as they play their role as a structure, they are immediately unfolded and their now linear sequence serves as a template for decoding.

recoding_stimulatory_region [SO_1001268]

A site in an mRNA sequence that stimulates the recoding of a region in the same mRNA.

recombination_enhancer [SO_0002059]

A regulatory_region that promotes or induces the process of recombination.

recombination_feature [SO_0000298]

A feature where there has been exchange of genetic material in the event of mitosis or meiosis

recombination_feature_of_rearranged_gene [SO_0000300]

A location where a gene is rearranged due to recombination during mitosis or meiosis.

recombination_hotspot [SO_0000339]

A region in a genome which promotes recombination.

recombination_regulatory_region [SO_0001681]

A regulatory region that is involved in the control of the process of recombination.

recombination_signal_sequence [SO_0001532]

A region recognized by a recombinase.

recombinationally_inverted_gene [SO_0000373]

A recombinationally rearranged gene by inversion.

recombinationally_rearranged [SO_0000940]

A gene that is recombinationally rearranged.

recombinationally_rearranged_gene [SO_0000456]

A gene that is recombinationally rearranged.

recombinationally_rearranged_vertebrate_immune_system_gene [SO_0000941]

A recombinationally rearranged gene of the vertebrate immune system.

recursive_splice_site [SO_0000998]

A recursive splice site is a splice site which subdivides a large intron. Recursive splicing is a mechanism that splices large introns by sub dividing the intron at non exonic elements and alternate exons.

reference genome sequence [SO_0001505]

A collection of sequences (often chromosomes) taken as the standard for a given organism and genome assembly.

region [SO_0000001]

A sequence_feature with an extent greater than zero. A nucleotide region is composed of bases and a polypeptide region is composed of amino acids.

regional_centromere [SO_0001795]

A regional centromere is a large modular centromere found in fission yeast and higher eukaryotes. It consist of a central core region flanked by inverted inner and outer repeat regions.

regional_centromere_central_core [SO_0001796]

A conserved region within the central region of a modular centromere, where the kinetochore is formed.

regional_centromere_inner_repeat_region [SO_0001798]

The inner inverted repeat region of a modular centromere and part of the central core surrounding a non-conserved central region. This region is adjacent to the central core, on each chromosome arm.

regional_centromere_outer_repeat_region [SO_0001799]

The heterochromatic outer repeat region of a modular centromere. These repeats exist in tandem arrays on both chromosome arms.

regional_centromere_outer_repeat_transcript [SO_0001905]

A transcript that is transcribed from the outer repeat region of a regional centromere.

regulated [SO_0000119]

An attribute to describe a sequence that is regulated.

regulatory_promoter_element [SO_0001678]

A promoter element that is not part of the core promoter, but provides the promoter with a specific regulatory region.

regulatory_region_ablation [SO_0001894]

A feature ablation whereby the deleted region includes a regulatory region. Created in conjunction with the EBI.

regulatory_region_amplification [SO_0001891]

A feature amplification of a region containing a regulatory region. Created in conjunction with the EBI.

regulatory_region_fusion [SO_0001887]

A feature fusion where the deletion brings together regulatory regions. Created in conjunction with the EBI.

regulatory_region_translocation [SO_0001884]

A feature translocation where the region contains a regulatory region. Created in conjunction with the EBI.

regulatory_region_variant [SO_0001566]

A sequence variant located within a regulatory region. EBI term: Regulatory region variations - In regulatory region annotated by Ensembl.

remark [SO_0000700]

A comment about the sequence.

repeat_component [SO_0000840]

A region of a repeated sequence. A manufactured to group the parts of repeats, to give them an is_a path back to the root.

repeat_family [SO_0000187]

A group of characterized repeat sequences.

repeat_fragment [SO_0001050]

A portion of a repeat, interrupted by the insertion of another element. Requested by Chris Smith, and others at Flybase to help annotate nested repeats.

repeat_region [SO_0000657]

A region of sequence containing one or more repeat units.

repeat_unit [SO_0000726]

The simplest repeated component of a repeat region. A single repeat. Added to comply with the feature table. A single repeat.

repetitive_element [SO_0000292]

[repetitive_element]

replication_regulatory_region [SO_0001682]

A regulatory region that is involved in the control of the process of nucleotide replication.

replicon [SO_0001235]

A region containing at least one unique origin of replication and a unique termination site.

rescue [SO_0000814]

An attribute describing a region’s ability, when introduced to a mutant organism, to re-establish (rescue) a phenotype.

rescue_gene [SO_0000816]

A gene that rescues.

rescue_mini_gene [SO_0000795]

A mini_gene that rescues.

rescue_region [SO_0000411]

A region that rescues.

resolution_site [SO_0000947]

A region specifically recognized by a recombinase, which separates a physically contiguous circle of DNA into two physically separate circles.

restriction_enzyme_assembly_scar [SO_0001953]

A region of DNA sequence formed from the ligation of two sticky ends where the palindrome is broken and no longer comprises the recognition site and thus cannot be re-cut by the restriction enzymes used to create the sticky ends.

restriction_enzyme_binding_site [SO_0000061]

A binding site that, in the nucleotide molecule, interacts selectively and non-covalently with polypeptide residues of a restriction enzyme. A region of a molecule that binds to a restriction enzyme.

restriction_enzyme_cleavage_junction [SO_0001688]

The boundary at which a restriction enzyme breaks the nucleotide sequence.

restriction_enzyme_cut_site [SO_0000168]

A specific nucleotide sequence of DNA at or near which a particular restriction enzyme cuts the DNA.

restriction_enzyme_five_prime_single_strand_overhang [SO_0001932]

A terminal region of DNA sequence where the end of the region is not blunt ended and the exposed single strand terminates at the 5’ end.

restriction_enzyme_recognition_site [SO_0001687]

The nucleotide region (usually a palindrome) that is recognized by a restriction enzyme. This may or may not be equal to the restriction enzyme binding site.

restriction_enzyme_region [SO_0001954]

A region related to restriction enzyme function. Not a great term for annotation, but used to classify the various regions related to restriction enzymes.

restriction_enzyme_single_strand_overhang [SO_0001695]

A terminal region of DNA sequence where the end of the region is not blunt ended.

restriction_enzyme_three_prime_single_strand_overhang [SO_0001933]

A terminal region of DNA sequence where the end of the region is not blunt ended and the exposed single strand terminates at the 3’ end.

retinoic_acid_responsive_element [SO_0001653]

A transcription factor binding site of variable direct repeats of the sequence PuGGTCA spaced by five nucleotides (DR5) found in the promoters of retinoic acid-responsive genes, to which retinoic acid receptors bind.

retrogene [SO_0001219]

A gene that has been produced as the product of a reverse transcriptase mediated event.

retron [SO_1001275]

Sequence coding for a short, single-stranded, DNA sequence via a retrotransposed RNA intermediate; characteristic of some microbial genomes.

reverse [SO_0001031]

Reverse is an attribute of the feature, where the feature is in the 3’ to 5’ direction. Again could be applied to primer.

reverse_Hoogsteen_base_pair [SO_0000501]

A type of non-canonical base-pairing.

reverse_primer [SO_0000132]

A single stranded oligo used for polymerase chain reaction. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

RH_map [SO_0001252]

A radiation hybrid map is a physical map.

rho_dependent_bacterial_terminator [SO_0000981]

A transcription terminator that is dependent upon Rho.

rho_independent_bacterial_terminator [SO_0000982]

A transcription terminator that is not dependent upon Rho. Rather, the mRNA contains a sequence that allows it to base-pair with itself and make a stem-loop structure.

ribonuclease_site [SO_0001977]

A region of a transcript encoding the cleavage site for a ribonuclease enzyme.

ribosome_entry_site [SO_0000139]

Region in mRNA where ribosome assembles.

riboswitch [SO_0000035]

A riboswitch is a part of an mRNA that can act as a direct sensor of small molecules to control their own expression. A riboswitch is a cis element in the 5’ end of an mRNA, that acts as a direct sensor of metabolites.

ribothymidine [SO_0001232]

A modified RNA base in which thymine is bound to the ribose ring. The free molecule is CHEBI:30832.

ribozymic [SO_0001186]

An attribute describing the sequence of a transcript that has catalytic activity even without an associated ribonucleoprotein. Do not use this for feature annotation. Use ribozyme (SO:0000374) instead.

right_handed_peptide_helix [SO_0001116]

A right handed helix is a region of peptide where the coiled conformation turns in a clockwise, right handed screw.

ring_chromosome [SO_1000045]

A ring chromosome is a chromosome whose arms have fused together to form a ring, often with the loss of the ends of the chromosome.

RNA [SO_0000356]

An attribute describing a sequence consisting of nucleobases bound to a repeating unit made of a D-ribose ring connected to a phosphate backbone. This term is mapped to MGED. Do not obsolete without consulting MGED ontology.

RNA_6S [SO_0000376]

A small (184-nt in E. coli) RNA that forms a hairpin type structure. 6S RNA associates with RNA polymerase in a highly specific manner. 6S RNA represses expression from a sigma70-dependent promoter during stationary phase.

RNA_aptamer [SO_0000033]

RNA molecules that have been selected from random pools based on their ability to bind other molecules.

RNA_chromosome [SO_0000961]

Structural unit composed of a self-replicating, RNA molecule.

RNA_hook_turn [SO_0000027]

[RNA_hook_turn; RNA hook turn; RNA_junction_loop; hook-turn motif]

RNA_internal_loop [SO_0000020]

A region of double stranded RNA where the bases do not conform to WC base pairing. The loop is closed on both sides by canonical base pairing. If the interruption to base pairing occurs on one strand only, it is known as a bulge.

RNA_junction_loop [SO_0000026]

[RNA junction loop; RNA_junction_loop; RNA_motif]

RNA_motif [SO_0000715]

A motif that is active in RNA sequence.

RNA_polymerase_II_TATA_box [SO_0001661]

A TATA box core promoter of a gene transcribed by RNA polymerase II.

RNA_polymerase_III_TATA_box [SO_0001662]

A TATA box core promoter of a gene transcribed by RNA polymerase III.

RNA_polymerase_promoter [SO_0001203]

A region (DNA) to which RNA polymerase binds, to begin transcription. Term merged with promoter SO:0000167 in August 2020 as part of GREEKC initiative. See GitHub Issue 492 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/492)

RNA_replication_mode [SO_0000972]

This has been obsoleted as it represents a process. replaced_by: GO:0034961. [RNA_replication_mode; RNA replication mode]

RNA_sequence_secondary_structure [SO_0000122]

A folded RNA sequence.

RNA_stability_element [SO_0001957]

RNA secondary structure that affects the stability of an RNA molecule.

RNA_stability_element [SO_0001979]

A motif that affects the stability of RNA.

RNAi_reagent [SO_0000337]

A double stranded RNA duplex, at least 20bp long, used experimentally to inhibit gene function by RNA interference.

RNApol_I_promoter [SO_0000169]

A DNA sequence in eukaryotic DNA to which RNA polymerase I binds, to begin transcription. parent term RNA_polymerase_promoter SO:0001203 was obsoleted in Aug 2020, so term has been moved to eukaryotic_promoter SO:0002221.

RNApol_II_core_promoter [SO_0001669]

The minimal portion of the promoter required to properly initiate transcription in RNA polymerase II transcribed genes.

RNApol_II_promoter [SO_0000170]

A DNA sequence in eukaryotic DNA to which RNA polymerase II binds, to begin transcription. parent term RNA_polymerase_promoter SO:0001203 was obsoleted in Aug 2020, so term has been moved to eukaryotic_promoter SO:0002221.

RNApol_II_promoter_region [SO_0000844]

A region of sequence which is a promoter for RNA polymerase II. This is a manufactured term to allow the parts of RNApol_II_promoter to have an is_a path back to the root.

RNApol_III_promoter [SO_0000171]

A DNA sequence in eukaryotic DNA to which RNA polymerase III binds, to begin transcription. parent term RNA_polymerase_promoter SO:0001203 was obsoleted in Aug 2020, so term has been moved to eukaryotic_promoter SO:0002221.

RNApol_III_promoter_type_1 [SO_0000617]

This type of promoter recruits RNA pol III. This promoter is intragenic and includes an A box, an intermediate element, and a C box. This is well conserved in the 5s rRNA promoters across species.

RNApol_III_promoter_type_1_region [SO_0000845]

A region of sequence which is a promoter for RNA polymerase III type 1. This is a manufactured term to allow the parts of RNApol_III_promoter_type_1 to have an is_a path back to the root.

RNApol_III_promoter_type_2 [SO_0000618]

This type of promoter recruits RNA pol III to transcribe genes mainly for t-RNA. This promoter is intragenic and includes an A box and a B box.

RNApol_III_promoter_type_2_region [SO_0000846]

A region of sequence which is a promoter for RNA polymerase III type 2. This is a manufactured term to allow the parts of RNApol_III_promoter_type_2 to have an is_a path back to the root.

RNApol_III_promoter_type_3 [SO_0000621]

This type of promoter recruits RNA pol III to transcribe predominantly noncoding RNAs. This promoter contains a proximal sequence element (PSE) and a TATA box upstream of the gene that it regulates. Transcription can also be activated by a distal sequence element (DSE), which is located further upstream.

RNase_MRP_RNA [SO_0000385]

The RNA molecule essential for the catalytic activity of RNase MRP, an enzymatically active ribonucleoprotein with two distinct roles in eukaryotes. In mitochondria it plays a direct role in the initiation of mitochondrial DNA replication. In the nucleus it is involved in precursor rRNA processing, where it cleaves the internal transcribed spacer 1 between 18S and 5.8S rRNAs.

RNase_MRP_RNA_gene [SO_0001640]

A gene that encodes a RNase_MRP_RNA.

RNase_P_RNA [SO_0000386]

The RNA component of Ribonuclease P (RNase P), a ubiquitous endoribonuclease, found in archaea, bacteria and eukarya as well as chloroplasts and mitochondria. Its best characterized activity is the generation of mature 5 prime ends of tRNAs by cleaving the 5 prime leader elements of precursor-tRNAs. Cellular RNase Ps are ribonucleoproteins. RNA from bacterial RNase Ps retains its catalytic activity in the absence of the protein subunit, i.e. it is a ribozyme. Isolated eukaryotic and archaeal RNase P RNA has not been shown to retain its catalytic function, but is still essential for the catalytic activity of the holoenzyme. Although the archaeal and eukaryotic holoenzymes have a much greater protein content than the bacterial ones, the RNA cores from all the three lineages are homologous. Helices corresponding to P1, P2, P3, P4, and P10/11 are common to all cellular RNase P RNAs. Yet, there is considerable sequence variation, particularly among the eukaryotic RNAs.

RNase_P_RNA_gene [SO_0001639]

A gene that encodes an RNase P RNA.

RprA_RNA [SO_0000387]

Translational regulation of the stationary phase sigma factor RpoS is mediated by the formation of a double-stranded RNA stem-loop structure in the upstream region of the rpoS messenger RNA, occluding the translation initiation site. Clones carrying rprA (RpoS regulator RNA) increased the translation of RpoS. The rprA gene encodes a 106 nucleotide regulatory RNA. As with DsrA Rfam:RF00014, RprA is predicted to form three stem-loops. Thus, at least two small RNAs, DsrA and RprA, participate in the positive regulation of RpoS translation. Unlike DsrA, RprA does not have an extensive region of complementarity to the RpoS leader, leaving its mechanism of action unclear. RprA is non-essential.

RR_tract [SO_0000435]

A polypurine tract within an LTR_retrotransposon.

RRE_RNA [SO_0000388]

The Rev response element (RRE) is encoded within the HIV-env gene. Rev is an essential regulatory protein of HIV that binds an internal loop of the RRE leading, encouraging further Rev-RRE binding. This RNP complex is critical for mRNA export and hence for expression of the HIV structural proteins.

rRNA [SO_0000252]

rRNA is an RNA component of a ribosome that can provide both structural scaffolding and catalytic activity. Definition updated 10 June 2021 as part of restructuring rRNA terms and reforming definitions to have similar structures. Request from EBI. See GitHub Issue #493

rRNA_21S [SO_0001171]

A component of the large ribosomal subunit in mitochondrial rRNA. This term has been merged into mt_LSU_rRNA (SO:0002345) as part of reorganization of rRNA child terms 10 June 2021. Requested by EBI. See GitHub Issue #493.

rRNA_25S [SO_0001002]

Cytosolic 25S rRNA is an RNA component of the large subunit of cytosolic ribosomes most eukaryotes. Renamed from rRNA_5S to cytosolic_5S_rRNA on 27 May 2021 with the restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Requested by EBI. See GitHub Issue #493.

rRNA_cleavage_RNA [SO_0005843]

An ncRNA that is part of a ribonucleoprotein that cleaves the primary pre-rRNA transcript in the process of producing mature rRNA molecules.

rRNA_cleavage_snoRNA_primary_transcript [SO_0000582]

A primary transcript encoding an rRNA cleavage snoRNA.

rRNA_encoding [SO_0000573]

A region that can be transcribed into a ribosomal RNA (rRNA).

rRNA_gene [SO_0001637]

A gene that encodes for ribosomal RNA.

rRNA_large_subunit_primary_transcript [SO_0000325]

A primary transcript encoding a large ribosomal subunit RNA.

rRNA_primary_transcript [SO_0000209]

A primary transcript encoding a ribosomal RNA.

rRNA_primary_transcript_region [SO_0000838]

A region of an rRNA primary transcript. To allow transcribed_spacer_region to have a path to the root.

rRNA_small_subunit_primary_transcript [SO_0000255]

A primary transcript encoding a small ribosomal subunit RNA.

RST [SO_0001467]

A tag produced from a single sequencing read from a RACE product; typically a few hundred base pairs long.

RST_match [SO_0001471]

A match against an RST sequence.

S_GNA [SO_0001196]

An attribute describing a GNA sequence in the (S)-GNA enantiomer. Do not use this term for feature annotation. Use S_GNA_oligo (SO:0001197) instead.

S_GNA_oligo [SO_0001197]

An oligo composed of (S)-GNA residues.

S_region [SO_0001836]

The switch region of immunoglobulin heavy chains; it is involved in the rearrangement of heavy chain DNA leading to the expression of a different immunoglobulin classes from the same B-cell.

SAGE_tag [SO_0000326]

A short diagnostic sequence tag, serial analysis of gene expression (SAGE), that allows the quantitative and simultaneous analysis of a large number of transcripts.

Sap1_recognition_motif [SO_0001864]

A DNA motif to which the S. pombe Sap1 protein binds. The consensus sequence is 5’-TARGCAGNTNYAACGMG-3’; it is found at the mating type locus, where it is important for mating type switching, and at replication fork barriers in rDNA repeats.

sarcin_like_RNA_motif [SO_0000024]

A loop in ribosomal RNA containing the sites of attack for ricin and sarcin.

scaRNA [SO_0002095]

A ncRNA, specific to the Cajal body, that has been demonstrated to function as a guide RNA in the site-specific synthesis of 2’-O-ribose-methylated nucleotides and pseudouridines in the RNA polymerase II-transcribed U1, U2, U4 and U5 spliceosomal small nuclear RNAs (snRNAs). Moved from is_a ncRNA (SO:0000655) to is_a snoRNA (SO:0000275) as per request from FlyBase by Dave Sant 24 April 2021. See GitHub Issue #509.

schellmann_loop [SO_0001123]

A motif of six or seven consecutive residues that contains two H-bonds.

schellmann_loop_seven [SO_0001124]

Wild type: A motif of seven consecutive residues that contains two H-bonds in which: the main-chain CO of residue(i) is H-bonded to the main-chain NH of residue(i+6), the main-chain CO of residue(i+1) is H-bonded to the main-chain NH of residue(i+5).

schellmann_loop_six [SO_0001125]

Common Type: A motif of six consecutive residues that contains two H-bonds in which: the main-chain CO of residue(i) is H-bonded to the main-chain NH of residue(i+5) the main-chain CO of residue(i+1) is H-bonded to the main-chain NH of residue(i+4).

score [SO_0001685]

The score of an experimentally derived feature such as a p-value.

scRNA [SO_0000013]

A small non coding RNA sequence, present in the cytoplasm.

scRNA_encoding [SO_0000575]

A region that can be transcribed into a small cytoplasmic RNA (scRNA).

scRNA_gene [SO_0001266]

A small noncoding RNA that is generally found only in the cytoplasm.

scRNA_primary_transcript [SO_0000012]

The primary transcript of any one of several small cytoplasmic RNA molecules present in the cytoplasm and sometimes nucleus of a Eukaryote.

SECIS_element [SO_1001274]

The incorporation of selenocysteine into a protein sequence is directed by an in-frame UGA codon (usually a stop codon) within the coding region of the mRNA. Selenoprotein mRNAs contain a conserved secondary structure in the 3’ UTR that is required for the distinction of UGA stop from UGA selenocysteine. The selenocysteine insertion sequence (SECIS) is around 60 nt in length and adopts a hairpin structure which is sufficiently well-defined and conserved to act as a computational screen for selenoprotein genes.

selenocysteine [SO_0001455]

A relatively rare amino acid encoded by the codon UGA in some contexts, whereas UGA is a termination codon in other contexts. A place holder for a cross product with chebi.

selenocysteine_loss [SO_0002009]

A sequence variant whereby at least one base of a codon encoding selenocysteine is changed, resulting in a different encoded amino acid. Request from Uma Devi Paila, UVA. Variants in the sites of rare amino acids e.g. Selenocysteine. These are important impact terms since a loss of such rare amino acids may lead to a loss of function.

selenocysteine_tRNA_primary_transcript [SO_0005856]

A primary transcript encoding seryl tRNA (SO:000269).

selenocysteinyl_tRNA [SO_0005857]

A tRNA sequence that has a selenocysteine anticodon, and a 3’ selenocysteine binding region.

self_cleaving_ribozyme [SO_0002231]

An RNA that catalyzes its own cleavage. Added as per request by John T. Sexton GitHub issue #470 (https://github.com/The-Sequence-Ontology/SO-Ontologies/issues/470)

sense_intronic_ncRNA [SO_0002131]

A long non-coding transcript found within an intron of a coding or non-coding gene, with no overlap of exonic sequence.

sense_overlap_ncRNA [SO_0002132]

A long non-coding transcript that contains a protein coding gene within its intronic sequence on the same strand, with no overlap of exonic sequence.

sequence variant_affecting_transcript_stability [SO_1000082]

Sequence variant affects the stability of the transcript. OBSOLETE: This term was deleted as it conflated more than one term. The alteration is separate from the effect.

sequence_alteration [SO_0001059]

A sequence_alteration is a sequence_feature whose extent is the deviation from another sequence. 1. A ‘sequence alteration’ is an allele whose sequence