This is the home of the BEL Language Documentation v2.0.

Please cite the BEL v2.0 language documentation as "BEL v2.0 Language Documentation, https://github.com/OpenBEL/language ", along with the date accessed.

Overview

The Biological Expression Language (BEL) is a language for representing biological observations in a computable form, along with contextual information. BEL is intended as a knowledge capture and interchange medium. BEL is used to qualitatively represent causal and correlative relationships involving biological measurements (e.g., RNA, protein, phosphorylated proteins). Each BEL statement stands alone as an individual observation or fact, and can be integrated with related observations into a cohesive network.

BEL is a human-readable and -writable language designed to be easy for life scientists to learn and use. BEL is comprised of a relatively small set of function and relationships types that can be used in conjunction with widely used vocabularies like HGNC human gene symbols, Gene Ontology, ChEBI, and MeSH. As a language of discourse for biological findings, BEL is designed to be "white-boardable" as well as written.

BEL History

BEL was initially designed in 2003 at Selventa (operating as Genstruct®) by Dexter Pratt. BEL was designed with a focus on capturing qualitative causal relationships that could be used for inference. From 2003 to 2010, BEL evolved in response to daily use by scientists representing findings derived from tens of thousands of abstracts and full-text articles.

In 2011, it was proposed to make BEL an open standard. BEL has been refined, formalized, and extended to meet the needs of a broader community to represent, manage, and share scientific findings in the life sciences. BEL and associated software was released as open-source technology in 2012. OpenBEL became a Linux Foundation Collaborative project in 2013.

BEL Version History

  • BEL v1.0 – initial open source release, 2012

  • BEL v2.0 – major revisions and refinements, 2014

Summary of Changes for BEL v2.0

These additions and modifications enhance the BEL language by providing new representation capability (e.g., DNA and RNA variants, protein cleavage fragments, cellular location of abundances) and enabling the use of external vocabularies (post-translational modifications, activities).

Variants

  • Now represent sequence variants at DNA, RNA, and protein levels.

  • Now represent multiple substitutions within the same gene/RNA/protein

  • New BEL abundance modifier function variant("") / var("") is used for most variant types, replacing substitution() / sub() and truncation() / trunc(). Human Genome Variation Society (HGVS) nomenclature adopted to describe variants (Dunnen and Antonarakis, 2000) within the var("") modifier function, expanding supported types of variation to include insertions, deletions, duplications as well as non-specific variants.

  • Usage of fus() changed. Instead of a modifier function for a gene/RNA/protein abundance, fus() is used to compose new entities that can be used in place of a namespace value for abundance functions.

Protein Cleavage Fragments

  • New abundance modifier function fragment() / frag() to be used within protein abundances to specify protein fragments based on amino acid sequence range.

Post-Translational Protein Modifications

  • The proteinModification() / pmod() abundance modifier function can now use external vocabularies (e.g., PSI-MOD) for modification types, enabling users to add types without requiring a language change.

  • Now multiple pmod() expressions can be used within a protein abundance.

Translocations and Cellular Location

  • New abundance modifier function to specify location - location() / loc().

  • Change in translocation() / tloc() function format, to explicitly add BEL location functions to location arguments.

Activity Functions

  • The ten distinct BEL activity functions, e.g., kinaseActivity() / kin(), catalyticActivity() / cat(), transcriptionalActivity() / tscript(), are consolidated to a single activity function activity() / act().

  • New modifier function molecularActivity() / ma() can be used to specify specific activity types, using external vocabularies, e.g., GO Molecular Function, or a default BEL vocabulary.

Regulates Relationship

  • New causal relationship regulates to represent cases where A is reported to affect B, but it cannot be determined if A increases or decreases B.

BEL Script Format Changes

  • Citation annotation requirement removed for Name field

  • Citation annotation DOI and URL added as accepted types

  • BEL Script Evidence Annotation renamed to Support

  • BEL version set in document header

1. Language Structure

Knowledge in BEL is expressed as BEL Statements. Generally, BEL Statements have the form of a subject - predicate - object triple, where the subject is a BEL Term, the predicate is one of the BEL relationship types (e.g., increases), and the object can be either a BEL Term or a BEL Statement. A BEL Statement may also be comprised of a subject term only.

BEL Terms are composed of BEL Functions applied to concepts referenced using Namespace identifiers. Each BEL Term represents either an abundance of a biological entity, e.g., human AKT1 protein, or a process such as apoptosis.

BEL Annotations are applied to BEL Statements to optionally express additional information about the statement itself such as the citation for the publication reporting the observation, or the context in which the observation was made (e.g., species, tissue, cell line).

1.1. Namespaces

BEL is specifically designed to adopt external vocabularies and ontologies, and represent life-science knowledge in the language and schema of the organization collecting or using the knowledge. Thus, BEL Terms are defined by reference to concepts in external vocabularies, which provide a set of well-known domain values, such as the official human gene symbols provided by HGNC (http://www.genenames.org/) . While we consider it good practice to define biological entities with respect to well-defined domains such as public ontologies, no specific vocabulary is essential to the use of BEL, and users are free to define and reference their own vocabularies as needed.

BEL uses Namespaces to unambiguously reference concepts. The user associates a Namespace prefix with an external vocabulary and uses the prefix to refer to elements of the vocabulary. For example, if we associate the Namespace prefix HGNC with the vocabulary of symbols managed by the HGNC committee, we can then compose BEL Terms by referencing the HGNC Namespace prefix and any concept from the HGNC namespace together with a relevant BEL Function, e.g., proteinAbundance(HGNC:AKT1) or rnaAbundance(HGNC:TNF).

1.1.1. Equivalencing between Namespaces

Values from different Namespaces may correspond to the same biological concept. For example, the name AKT1 in the HGNC Namespace refers to the same gene referenced with ID 207 in the EGID (Entrez Gene Identifier) Namespace. The BEL Framework assembles knowledge into a cohesive network, mapping equivalent BEL Terms, e.g., proteinAbundance(HGNC:AKT1) and proteinAbundance(EGID:207), to a single node in the network. This correspondence of Namespace values is handled in the BEL Framework separately from BEL knowledge representation.

1.2. Terms

Two general categories of biological entities are represented as BEL Terms: abundances and processes.

1.2.1. Abundances

Life science experiments often measure the abundance of a type of thing in a given sample or set of samples. BEL Abundance Terms represent classes of abundance, the abundances of specific types of things. Examples include the protein abundance of TP53, the RNA abundance of CCND1, the abundance of the protein AKT1 phosphorylated at serine 21, or the abundance of the complex of the proteins CCND1 and CDK4.

1.2.2. Processes

BEL Process Terms represent classes of complex phenomena taking place at the level of the cell or the organism, such as the biological process of cell cycle or a disease process such as Cardiomyopathy. In other cases, BEL Terms may represent classes of specific molecular activities, such as the kinase activity of the AKT1 protein, or a specific chemical reaction like conversion of superoxides to hydrogen peroxide and oxygen.

Measurable biological parameters such as Blood Pressure or Body Temperature are represented as process BEL Terms. These BEL Terms denote biological activities that, when measured, are reduced to an output parameter.

1.2.3. BEL Terms as Functional Expressions

BEL Terms are denoted by expressions composed of a BEL Function and a list of arguments. BEL v2.0 specifies a set of approximately 20 functions allowed in term expressions.

The combination of a BEL function and its arguments fully specifies a BEL Term. The BEL Term expression f(a) denotes a BEL Term defined by function f() applied to an argument a. Wherever the same function is applied to the same arguments, the resulting BEL Term references the same biological entity.

The semantics of a BEL Term are determined by the function used in the term expression. For example, the function proteinAbundance() is defined such that any term expression using proteinAbundance() represents a class of abundance of protein. Many BEL functions take only single values as arguments, providing a structured method of using ontologies and vocabularies in BEL. For example, values in the HUGO Gene Nomenclature Committee (HGNC) vocabulary of official human gene symbols can be used to designate gene, RNA, and protein abundances. The function proteinAbundance() could then be applied to an HGNC gene symbol, AKT1 for example, to indicate the class of protein abundances produced by the AKT1 gene, producing the BEL Term proteinAbundance(HGNC:AKT1).

1.3. Statements

A BEL Statement represents an experimental observation, generally reported in a scientific publication or unpublished experimental data. Generally, BEL Statements express a causal or correlative relationship between two biological entities. Because BEL Terms are functionally composed, a BEL Statement can consist of a single BEL Term; this simple statement indicates that the biological entity represented by the term has been observed.

1.3.1. Example BEL Statements

Subject Term Only
complex(p(HGNC:CCND1), p(HGNC:CDK4))

The abundance of a complex formed from protein abundances designated by CCND1 and CDK4 in the HGNC namespace. This is a subject term only statement, and indicates that the entity specified by the term has been observed.

Causal
p(HGNC:CCND1) => act(p(HGNC:CDK4))

The abundance of the protein designated by CCND1 in the HGNC namespace directly increases the activity of the abundance of the protein designated by CDK4 in the HGNC namespace.

Causal
p(HGNC:BCL2)-| bp(MESHPP:Apoptosis)

The abundance of the protein designated by BCL2 in the HGNC namespace decreases the biological process designated by apoptosis in the MESHPP (phenomena and processes) namespace.

Nested Statement - Object Term is Statement
p(HGNC:GATA1) => ( act(p(HGNC:ZBTB16)) => r(HGNC:MPL) )

The abundance of the protein designated by GATA1 in the HGNC namespace directly increases the process in which the activity of the protein abundance designated by ZBTB16 in the HGNC namespace directly increases the abundance of RNA designated by MPL in the HGNC namespace.

1.4. Annotations

Each BEL Statement can optionally be annotated to express knowledge about the statement itself. Some important uses of annotations are to specify information about the:

  • biological system in which the observation represented by the statement was made

  • experimental methods used to demonstrate the observation

  • knowledge source on which the statement is based, such as the citation and specific text supporting the statement

Examples of annotations that could be associated with a BEL Statement are the:

  • PubMed ID specifying the publication in which the observation was reported,

  • Species, tissue, and cellular location in which the observations were made, and

  • Dosage, exposure and recovery time associated with the observation.

2. BEL Functions

This section provides a listing and explanation of all BEL functions that are included in the BEL v2.0 Language Specification.

2.1. Abundance Functions

The following BEL Functions represent classes of abundances of specific types of biological entities like RNAs, proteins, post-translationally modified proteins, and small molecules. Biological experiments frequently involve the manipulation and measurement of entities in samples. These BEL functions specify the type of entity referred to by a namespace value. For example,geneAbundance(HGNC:AKT1), rnaAbundance(HGNC:AKT1), and proteinAbundance(HGNC:AKT1), represent the abundances of the AKT1 gene, RNA, and protein, respectively.

2.1.1. abundance(), a()

abundance(ns:v) or a(ns:v) denotes the abundance of the entity designated by the value v in the namespace ns. abundance is a general abundance term that can be used for chemicals or other molecules not defined by a more specific abundance function. Gene, RNA, protein, and microRNA abundances should be represented using the appropriate specific abundance function.

Examples - small molecule and chemical
a(CHEBI:"oxygen atom")
a(CHEBI:thapsigargin)

2.1.2. complexAbundance(), complex()

The complexAbundance() or complex() function can be used with either a namespace value or with a list of abundance terms.

complexAbundance(ns:v) or complex(ns:v) denotes the abundance of the molecular complex designated by the value v in the namespace ns. This form is generally used to identify abundances of named complexes.

Example - named complex
complex(SCOMP:"AP-1 Complex")

complexAbundance(<abundance term list>) denotes the abundance of the molecular complex of members of the abundances denoted by <abundance term list>, a list of abundance terms supplied as arguments. The list is unordered, thus different orderings of the arguments should be interpreted as the same term. Members of a molecular complex retain their individual identities. The complexAbundance() function does not specify the duration or stability of the interaction of the members of the complex.

Example - composed complex
complex(p(HGNC:FOS), p(HGNC:JUN))

2.1.3. compositeAbundance(), composite()

The compositeAbundance(<abundance term list>) function takes a list of abundance terms. The compositeAbundance() or composite() function is used to represent cases where multiple abundances synergize to produce an effect. The list is unordered, thus different orderings of the arguments should be interpreted as the same term. This function should not be used if any of the abundances alone are reported to cause the effect. compositeAbundance() terms should be used only as subjects of statements, not as objects.

Example - BEL Statement with compositeAbundance term
composite(p(HGNC:IL6), complex(GOCC:"interleukin-23 complex")) increases bp(GOBP:"T-helper 17 cell differentiation")

In the above example, IL-6 and IL-23 synergistically induce Th17 differentiation.

2.1.4. geneAbundance(), g()

geneAbundance(ns:v) or g(ns:v) denotes the abundance of the gene designated by the value v in the namespace ns. geneAbundance() terms are used to represent the DNA encoding the specified gene. geneAbundance() is considered decreased in the case of a homozygous or heterozygous gene deletion, and increased in the case of a DNA amplification mutation. Events in which a protein binds to the promoter of a gene can be represented using the geneAbundance() function.

Example - promoter binding event represented using geneAbundance
complex(p(HGNC:TP53), g(HGNC:CDKN1A))

In the above example, the p53 protein binds the CDKN1A gene.

2.1.5. microRNAAbundance(), m()

microRNAAbundance(ns:v) or m(ns:v) denotes the abundance of the processed, functional microRNA designated by the value v in the namespace ns.

Example - microRNA abundance
m(HGNC:MIR21)

2.1.6. proteinAbundance(), p()

proteinAbundance(ns:v) or p(ns:v) denotes the abundance of the protein designated by the value v in the namespace ns, where v references a gene or a named protein family.

Examples - protein abundances
p(HGNC:AKT1)
p(SFAM:"AKT Family")

2.1.7. rnaAbundance(), r()

rnaAbundance(ns:v) or r(ns:v) denotes the abundance of the RNA designated by the value v in the namespace ns, where v references a gene. This function refers to all RNA designated by ns:v, regardless of splicing, editing, or polyadenylation stage.

Example - RNA abundance
r(HGNC:AKT1)

2.2. Abundance Modifier Functions

The following BEL functions are special functions that can be used only as an argument within an abundance function. These functions modify the abundance to specify sequence variations (gene, RNA, microRNA, protein), post-translational modifications (protein), fragment resulting from proteolytic processing (protein), or cellular location (most abundance types).

2.2.1. Protein Modifications

proteinModification(), pmod()

The proteinModification() or pmod() function can be used only as an argument within a proteinAbundance() function to indicate modification of the specified protein. Multiple modifications can be applied to the same protein abundance. Modified protein abundance term expressions have the general form:

p(ns:protein_value, pmod(ns:type_value, <code>, <pos>))

type_value (required) is a namespace value for the type of modification , <code> (optional) is a single-letter or three-letter code for one of the twenty standard amino acids, and <pos> (optional) is the position at which the modification occurs based on the reference sequence for the protein. If <pos> is omitted, then the position of the modification is unspecified. If both <code> and <pos> are omitted, then the residue and position of the modification are unspecified. NOTE - A default BEL namespace includes commonly used protein modification types.

Examples
AKT1 phosphorylated at Serine 473

default BEL namespace and 1-letter amino acid code:

p(HGNC:AKT1, pmod(Ph, S, 473))

default BEL namespace and 3-letter amino acid code:

p(HGNC:AKT1, pmod(Ph, Ser, 473))

PSI-MOD namespace and 3-letter amino acid code:

p(HGNC:AKT1, pmod(MOD:PhosRes, Ser, 473))
MAPK1 phosphorylated at both Threonine 185 and Tyrosine 187

default BEL namespace and 3-letter amino acid code:

p(HGNC:MAPK1, pmod(Ph, Thr, 185), pmod(Ph, Tyr, 187))
Palmitoylated HRAS

HRAS palmitoylated at an unspecified residue. Default BEL namespace:

p(HGNC:HRAS, pmod(Palm))
Modification Types Provided in Default BEL Namespace

Additional modification types can be requested as needed, or an external vocabulary can be used. Like other BEL namespace values, these modification types can be equivalenced to values in other vocabularies.

Label

Synonym

Ac

acetylation

ADPRib

ADP-ribosylation

ADP-rybosylation

adenosine diphosphoribosyl

Farn

farnesylation

Gerger

geranylgeranylation

Glyco

glycosylation

Hy

hydroxylation

ISG

ISGylation

ISG15-protein conjugation

Me

methylation

Me1

monomethylation

mono-methylation

Me2

dimethylation

di-methylation

Me3

trimethylation

tri-methylation

Myr

myristoylation

Nedd

neddylation

NGlyco

N-linked glycosylation

NO

Nitrosylation

OGlyco

O-linked glycosylation

Palm

palmitoylation

Ph

phosphorylation

Sulf

sulfation

sulphation

sulfur addition

sulphur addition

sulfonation

sulphonation

Sumo

SUMOylation

Ub

ubiquitination

ubiquitinylation

ubiquitylation

UbK48

Lysine 48-linked polyubiquitination

UbK63

Lysine 63-linked polyubiquitination

UbMono

monoubiquitination

UbPoly

Supported One- and Three-letter Amino Acid Codes

Amino Acid

1-Letter Code

3-Letter Code

Alanine

A

Ala

Arginine

R

Arg

Asparagine

N

Asn

Aspartic Acid

D

Asp

Cysteine

C

Cys

Glutamic Acid

E

Glu

Glutamine

Q

Gln

Glycine

G

Gly

Histidine

H

His

Isoleucine

I

Ile

Leucine

L

Leu

Lysine

K

Lys

Methionine

M

Met

Phenylalanine

F

Phe

Proline

P

Pro

Serine

S

Ser

Threonine

T

Thr

Tryptophan

W

Trp

Tyrosine

Y

Tyr

Valine

V

Val

2.2.2. Variants

variant(""), var("")

The variant("<expression>") or var("<expression>") function can be used as an argument within a geneAbundance(), rnaAbundance(), microRNAAbundance(), or proteinAbundance() to indicate a sequence variant of the specified abundance. The var("") function takes HGVS variant description expression, e.g., for a substitution, insertion, or deletion variant. Multiple var("") arguments may be applied to an abundance term.

Protein examples
reference allele
p(HGNC:CFTR, var("="))

This is different than p(HGNC:CFTR), the root protein abundance, which includes all variants.

unspecified variant
p(HGNC:CFTR, var("?"))
substitution
p(HGNC:CFTR, var("p.Gly576Ala"))
p(REF:"NP_000483.3", var("p.Gly576Ala"))

CFTR substitution variant Glycine 576 Alanine (HGVS NP_000483.3:p.Gly576Ala). Because a specific position is referenced, a namespace value for a non-ambiguous sequence like the RefSeq ID in the lower example is preferred over the HGNC gene symbol. The p. within the var("") expression indicates that the numbering is based on a protein sequence.

deletion
p(HGNC:CFTR, var("p.Phe508del"))
p(REF:"NP_000483.3", var("p.Phe508del"))

CFTR ΔF508 variant (HGVS NP_000483.3:p.Phe508del). Because a specific position is referenced, a namespace value for a non-ambiguous sequence like the RefSeq ID in the lower example is preferred over the HGNC gene symbol. The p. within the var("") expression indicates that the numbering is based on a protein reference sequence.

frameshift
p(HGNC:CFTR, var("p.Thr1220Lysfs"))
p(REF:"NP_000483.3", var("p.Thr1220Lysfs"))

CFTR frameshift variant (HGVS NP_000483.3:p.Thr1220Lysfs*7). Because a specific position is referenced, a namespace value for a non-ambiguous sequence like the RefSeq ID in the lower example is preferred over the HGNC gene symbol. The p. within the var("") expression indicates that the numbering is based on a protein reference sequence.

DNA (gene) examples

These are all representations of CFTR ΔF508.

SNP
g(SNP:rs113993960, var("delCTT"))
chromosome
g(REF:"NC_000007.13", var("g.117199646_117199648delCTT"))
gene - coding DNA reference sequence
g(HGNC:CFTR, var("c.1521_1523delCTT"))
g(REF:"NM_000492.3", var("c.1521_1523delCTT"))

Because a specific position is referenced, a namespace value for a non-ambiguous sequence like the RefSeq ID in the lower example is preferred over the HGNC gene symbol. The c. within the var("") expression indicates that the numbering is based on a coding DNA reference sequence.The coding DNA reference sequence covers the part of the transcript that is translated into protein; numbering starts at the A of the initiating ATG codon, and ends at the last nucleotide of the translation stop codon.

RNA examples

These are all representations of CFTR ΔF508.

coding reference sequence
r(HGNC:CFTR, var("c.1521_1523delCTT"))
r(REF:"NM_000492.3", var("c.1521_1523delCTT"))

Because a specific position is referenced, a namespace value for a non-ambiguous sequence like the RefSeq ID in the lower example is preferred over the HGNC gene symbol. The c. within the var("") expression indicates that the numbering is based on a coding DNA reference sequence. The coding DNA reference sequence covers the part of the transcript that is translated into protein; numbering starts at the A of the initiating ATG codon, and ends at the last nucleotide of the translation stop codon.

RNA reference sequence
r(HGNC:CFTR, var("r.1653_1655delcuu"))
r(REF:"NM_000492.3", var("r.1653_1655delcuu"))

Because a specific position is referenced, a namespace value for a non-ambiguous sequence like the RefSeq ID in the lower example is preferred over the HGNC gene symbol. The r. within the var("") expression indicates that the numbering is based on an RNA reference sequence. The RNA reference sequence covers the entire transcript except for the poly A-tail; numbering starts at the transcription initiation site and ends at the transcription termination site.

2.2.3. Proteolytic fragments

fragment(""), frag("")

The fragment() or frag() function can be used within a proteinAbundance() term to specify a protein fragment, e.g., a product of proteolytic cleavage. Protein fragment expressions take the general form:

p(ns:v, frag(<range>, <descriptor>))

where <range> (required) is an amino acid range, and <descriptor> (optional) is any additional distinguishing information like fragment size or name.

Examples

For these examples, HGNC:YFG is ‘your favorite gene’. For the first four examples, only the <range> argument is used. The last examples include use of the optional <descriptor>.

fragment with known start/stop
p(HGNC:YFG, frag("5_20"))
amino-terminal fragment of unknown length
p(HGNC:YFG, frag("1_?"))
carboxyl-terminal fragment of unknown length
p(HGNC:YFG, frag("?_*"))
fragment with unknown start/stop
p(HGNC:YFG, frag("?"))
fragment with unknown start/stop and a descriptor
p(HGNC:YFG, frag("?", "55kD"))

2.2.4. Cellular location

location(), loc()

location() or loc() can be used as an argument within any abundance function except compositeAbundance() to represent a distinct subset of the abundance at that location. Location subsets of abundances have the general form:

f(ns:v, loc(ns:v))
Examples
Cytoplasmic pool of AKT1 protein
p(HGNC:AKT1, loc(MESHCS:Cytoplasm))
Endoplasmic Reticulum pool of Ca2+
a(CHEBI:"calcium(2+)", loc(GOCC:"endoplasmic reticulum"))

2.3. Process Functions

The following BEL Functions represent classes of events or phenomena taking place at the level of the cell or the organism which do not correspond to molecular abundances, but instead to a biological process like angiogenesis or a pathology like cancer.

2.3.1. biologicalProcess(), bp()

biologicalProcess(ns:v) or bp(ns:v) denotes the process or population of events designated by the value v in the namespace ns.

Examples
bp(GOBP:"cell cycle arrest")
bp(GOBP:angiogenesis)

2.3.2. pathology(), path()

pathology(ns:v) or path(ns:v) denotes the disease or pathology process designated by the value v in the namespace ns. The +pathology()` function is included to facilitate the distinction of pathologies from other biological processes because of their importance in many potential applications in the life sciences.

Examples
pathology(MESHD:"Pulmonary Disease, Chronic Obstructive")
pathology(MESHD:adenocarcinoma)

2.3.3. activity(), act()

activity(<abundance>) or act(<abundance) is used to specify events resulting from the molecular activity of an abundance. The activity() function provides distinct terms that enable differentiation of the increase or decrease of the molecular activity of a protein from changes in the abundance of the protein. activity() can be applied to a protein, complex, or RNA abundance term, and modified with a molecularActivity() argument to indicate a specific type of molecular activity.

Example
act(p(HGNC:AKT1))

2.4. Process Modifier Function

2.4.1. molecularActivity(), ma()

molecularActivity(ns:v) or ma(ns:v) is used to denote a specific type of activity function within an activity() term.

NOTE - The default BEL namespace (DEFAULT) includes commonly used molecular activity types, mapping directly to the BEL v1.0 activity functions.

Examples
default BEL namespace, transcriptional activity (DEFAULT namespace is optional)
act(p(HGNC:FOXO1), ma(DEFAULT:tscript))
GO molecular function namespace, transcriptional activity
act(p(HGNC:FOXO1), ma(GO:"nucleic acid binding transcription factor activity"))
default BEL namespace, kinase activity
act(p(HGNC:AKT1), ma(kin))
GO molecular function namespace, kinase activity
act(p(HGNC:AKT1), ma(GO:"kinase activity"))

2.5. Transformation Functions

The following BEL functions represent transformations. Transformations are processes or events in which one class of abundance is transformed or changed into a second class of abundance by translocation, degradation, or participation in a reaction. All types of abundance terms except compositeAbundance() may be used within these transformation functions.

2.5.1. Translocations

BEL translocation functions include translocation() as well as cellSurfaceExpression() and cellSecretion(), two functions intended to provide a simple, standard means of expressing commonly represented translocations.

translocation(), tloc()

For the abundance term A, translocation(<abundance>, fromLocation(ns1:v1), toLocation(ns2:v2)) or tloc(<abundance>, fromLoc(ns1:v1), toLoc(ns2:v2)) denotes the frequency or number of events in which members of <abundance> move from the location designated by the value v1 in the namespace ns1 to the location designated by the value v2 in the namespace ns2. Translocation is applied to represent events on the cellular scale, like endocytosis and movement of transcription factors from the cytoplasm to the nucleus. Special case translocations are handled by the BEL functions: cellSecretion(), cellSurfaceExpression().

Example

endocytosis (translocation from the cell surface to the endosome) of the epidermal growth factor receptor (EGFR) protein can be represented as:

tloc(p(HGNC:EGFR), fromLoc(GOCC:"cell surface"), toLoc(GOCC:endosome))
cellSecretion(), sec()

For the abundance term A, cellSecretion(<abundance>) or sec(<abundance>) denotes the frequency or number of events in which members of <abundance> move from cells to regions outside of the cells. cellSecretion(<abundance> can be equivalently expressed as:

tloc(<abundance>, fromLoc(GOCC:intracellular), toLoc(GOCC:"extracellular space"))

The intent of the cellSecretion() function is to provide a simple, standard means of expressing a commonly represented translocation.

cellSurfaceExpression(), surf()

cellSurfaceExpression(<abundance>) or surf(<abundance>) denotes the frequency or abundance of events in which members of <abundance> move to the surface of cells. cellSurfaceExpression(<abundance>) can be equivalently expressed as:

tloc(<abundance>, fromLoc(GOCC:intracellular), toLoc(GOCC:"cell surface"))

The intent of the cellSurfaceExpression() function is to provide a simple, standard means of expressing a commonly represented translocation.

2.5.2. degradation(), deg()

degradation(<abundance>) or deg(<abundance>) denotes the frequency or number of events in which a member of <abundance> is degraded in some way such that it is no longer a member of <abundance>. For example, degradation() is used to represent proteasome-mediated proteolysis. The BEL Framework automatically connects deg(<abundance>) to <abundance> such that:

deg(<abundance>) directlyDecreases <abundance>

2.5.3. reaction(), rxn()

reaction(reactants(<abundance term list1>), products(<abundance term list2>)) denotes the frequency or abundance of events in which members of the abundances in <abundance term list1> (the reactants) are transformed into members of the abundances in <abundance term list2> (the products).

Example

The reaction in which superoxides are dismutated into oxygen and hydrogen peroxide can be represented as:

rxn(reactants(a(CHEBI:superoxide)),products(a(CHEBI:"hydrogen peroxide"), a(CHEBI: "oxygen")))

2.6. Other Functions

2.6.1. fusion(), fus()

fusion() or fus() expressions can be used in place of a namespace value within a gene, RNA, or protein abundance function to represent a hybrid gene, or gene product formed from two previously separate genes. fusion() expressions take the general form:

fus(ns5':v5', "range5'", ns3':v3', "range3'")

where ns5':v5' is a namespace and value for the 5' fusion partner, range5' is the sequence coordinates of the 5' partner, ns3':v3' is a namespace and value for the 3' partner, and range3' is the sequence coordinates for the 3' partner. Ranges need to be in quotes.

Example
RNA abundance of fusion with known breakpoints
r(fus(HGNC:TMPRSS2, "r.1_79", HGNC:ERG, "r.312_5034"))

The r. designation in the range fields indicates that the numbering uses the RNA sequence as the reference. RNA sequence numbering starts at the transcription initiation site. You use c._ for g() fusions and p._ for p() fusions. These r., c., and p. designations come from HGVS variation description convention.

RNA abundance of fusion with unspecified breakpoints
r(fus(HGNC:TMPRSS2, "?", HGNC:ERG, "?"))

3. BEL Relationships

The following BEL Relationship types are included in the BEL v2.0 language specification:

The most used BEL relationships should be the causal and correlative relationship categories. Relationships not used in the written BEL language, but introduced by the BEL Framework during compilation of a BEL network are not covered in this document.

3.1. Causal Relationships

These relationship types denote a causal relationship, or the absence of a causal relationship between a subject and an object term.

3.1.1. increases, →

For terms A and B, A increases B or A → B indicate that increases in A have been observed to cause increases in B.

A increases B also represents cases where decreases in A have been observed to cause decreases in B, for example, in recording the results of gene deletion or other inhibition experiments.

A is a BEL Term and B is either a BEL Term or a BEL Statement.

The increases relationship does not indicate that the changes in A are either necessary for changes in B, nor does it indicate that changes in A are sufficient to cause changes in B.

3.1.2. directlyIncreases, ⇒

For terms A and B, A directlyIncreases B or A ⇒ B indicates that increases in A have been observed to cause increases in B and that the mechanism of the causal relationship is based on physical interaction of entities related to A and B. This is a direct version of the increases relationship.

3.1.3. decreases, -|

For terms A and B, A decreases B or A -| B indicate that increases in A have been observed to cause decreases in B.

A decreases B also represents cases where decreases in A have been observed to cause increases in B, for example, in recording the results of gene deletion or other inhibition experiments.

A is a BEL Term and B is either a BEL Term or a BEL Statement.

The decreases relationship does not indicate that the changes in A are either necessary for changes in B, nor does it indicate that changes in A are sufficient to cause changes in B.

3.1.4. directlyDecreases, =|

For terms A and B, A directlyDecreases B or A =| B indicates that increases in A have been observed to cause decreases in B and that the mechanism of the causal relationship is based on physical interaction of entities related to A and B. This is a direct version of the decreases relationship.

3.1.5. rateLimitingStepOf

For process, activity, or transformation term A and process term P, A rateLimitingStepOf P indicates both:

A subProcessOf B
A -> B
Example

The catalytic activity of HMG CoA reductase is a rate-limiting step for cholesterol biosynthesis:

act(p(HGNC:HMGCR), ma(cat)) rateLimitingStepOf bp(GOBP:"cholesterol biosynthetic process")

3.1.6. causesNoChange, cnc

For terms A and B, A causesNoChange B or A cnc B indicate that B was observed not to change in response to changes in A.

Statements using this relationship correspond to cases where explicit measurement of B demonstrates lack of significant change, not for cases where the state of B is unknown.

3.1.7. regulates, reg

For terms A and B, A regulates B or A reg B indicate that A is reported to have an effect on B, but information is missing about whether A increases B or A decreases B. This relationship provides more information than association, because the upstream entity (source term) and downstream entity (target term) can be assigned.

Direct Relationships

Direct relationships include direct causal relationships and non-causal relationships that are considered direct because they are self-referential.

Direct causal relationships

The direct casual relationships included in BEL v2.0 are directlyIncreases () and directlyDecreases (=|).

The direct casual relationships are causal relationships where the mechanism of the causal relationship is based on the physical interaction of entities related to the BEL Statement subject and object terms.

If A or B is an abundance, then members of the abundance are part of the interaction. If A or B are activities activities, then members of the abundances performing the activities physically interact.

Examples
Abundances and activities

Inhibition of the Patched 1 receptor signaling activity by Hedgehog is represented as direct, because Hedgehog and Patched 1 physically interact:

p(PFH:"Hedgehog Family") =| act(p(HGNC:PTCH1))
Transcription

In the case of transcriptional activity, if the protein performing the transcriptional activity interacts with the gene that the RNA is transcribed from, the relationship is considered direct. For example, repression of the transcription of miR-21 by FOXO3 protein transcriptional activity is represented as direct because FOXO3 binds the miR-21 promoter:

act(p(HGNC:FOXO3),ma(tscript)) =| r(HGNC:MIR21)
Target term is BEL statement

If B is a BEL Statement, the relationship is considered direct if the subject abundance term for B physically interacts with the abundance term for A. For example, for the BEL Statement:

p(HGNC:CLSPN) => (act(p(HGNC:ATR), ma(kin)) => p(HGNC:CHEK1, pmod(Ph)))

CLSPN protein is considered to directly activate the phosphorylation of CHEK1 protein by the kinase activity of ATR, because the CLSPN and ATR proteins physically interact.

Self-referential relationships

Self-referential causal relationships are generally represented as direct. For example, phosphorylation of GSK3B at serine 9 inhibiting the kinase activity of GSK3B can be represented as:

p(HGNC:GSK3B, pmod(Ph, S, 9)) =| act(p(HGNC:GSK3B), ma(kin))

3.2. Correlative Relationships

These relationship types link abundances and biological processes when no causal relationship is known. The order of subject and object terms does not matter in a statement with a correlative relationship, unlike a statement with a causal relationship.

3.2.1. negativeCorrelation, neg

For terms A and B, A negativeCorrelation B or A neg B indicates that changes in A and B have been observed to be negatively correlated. The order of the subject and object does not affect the interpretation of the statement, thus B negativeCorrelation A is equivalent to A negativeCorrelation B.

3.2.2. positiveCorrelation, pos

For terms A and B, A positiveCorrelation B or A pos B indicates that changes in A and B have been observed to be positively correlated. The order of the subject and object does not affect the interpretation of the statement, thus B positiveCorrelation A is equivalent to A positiveCorrelation B.

3.2.3. association, — 

For terms A and B, A association B or A — B indicates that A and B are associated in an unspecified manner. This relationship is used when not enough information about the association is available to describe it using more specific relationships, like increases or positiveCorrelation. The order of the subject and object does not affect the interpretation of the statement, thus B — A is equivalent to A — B.

3.3. Genomic Relationships

These relationship types link related terms, like orthologous terms from two different species or the geneAbundance() and rnaAbundance() terms for the same namespace value.

Tip

In most cases, these relationships will be introduced by the BEL Namespace resources, and are not needed for creation of BEL Statements and BEL Documents.

3.3.1. orthologous

For terms A and B, A orthologous B indicates that A and B represent entities in different species which are sequence similar and which are therefore presumed to share a common ancestor. For example,

g(HGNC:AKT1) orthologous g(MGI:AKT1)

indicates that the mouse and human AKT1 genes are orthologs.

3.3.2. transcribedTo, :>

For RNA abundance term R and gene abundance term G, G transcribedTo R or G :> R indicates that members of R are produced by the transcription of members of G. For example:

g(HGNC:AKT1) :> r(HGNC:AKT1)

indicates that the human AKT1 RNA is transcribed from the human AKT1 gene.

3.3.3. translatedTo, >>

For RNA abundance term R and protein abundance term P, R translatedTo P or R >> P indicates that members of P are produced by the translation of members of R. For example:

r(HGNC:AKT1) >> p(HGNC:AKT1)

indicates that AKT1 protein is produced by translation of AKT1 RNA.

3.4. Other Relationships

Additional miscellaneous relationship types. Icon In most cases, these relationships will be introduced by the BEL Namespace resources, and are not needed for creation of BEL Statements and BEL Documents.

3.4.1. hasMember

For term abundances A and B, A hasMember B designates B as a member class of A. A member class is a distinguished sub-class. A is defined as a group by all of the members assigned to it. The member classes may or may not be overlapping and may or may not entirely cover all instances of A. A term may not appear in both the subject and object of the same hasMember statement.

3.4.2. hasMembers

The hasMembers relationship is a special form which enables the assignment of multiple member classes in a single statement where the object of the statement is a set of abundance terms. A statement using hasMembers is exactly equivalent to multiple hasMember statements. A term may not appear in both the subject and object of the same hasMembers statement.

For the abundance terms A, B, C and D, A hasMembers list(B, C, D) indicates that A is defined by its member abundance classes B, C and D.

3.4.3. hasComponent

For complex abundance term A and abundance term B, A hasComponent B designates B as a component of A, that complexes that are instances of A have instances of B as possible components. Note that, the stoichiometry of A is not described, nor is it stated that B is a required component. The use of hasComponent relationships is complementary to the use of functionally composed complexes and is intended to enable the assignment of components to complexes designated by names in external vocabularies. The assignment of components can potentially enable the reconciliation of equivalent complexes at knowledge assembly time.

3.4.4. hasComponents

The hasComponents relationship is a special form which enables the assignment of multiple complex components in a single statement where the object of the statement is a set of abundance terms. A statement using hasComponents is exactly equivalent to multiple hasComponent statements. A term may not appear in both the subject and object of the same hasComponents statement.

For the abundance terms A, B, C and D, A hasComponents list(B, C, D) indicates that A has components B, C and D.

3.4.5. isA

For terms A and B, A isA B indicates that A is a subset of B.

All terms in BEL 1.0 represent classes, but given that classes implicitly have instances, A isA B is interpreted to mean that any instance of A must also be an instance of B. This relationship can be used to represent GO and MeSH hierarchies:

pathology(MESH:Psoriasis) isA pathology(MESH:"Skin Diseases")

3.4.6. subProcessOf

For process, activity, or transformation term A and process term P, A subProcessOf P indicates that instances of process P, by default, include one or more instances of A in their composition. For example, the reduction of HMG-CoA to mevalonate is a subprocess of cholesterol biosynthesis:

rxn(reactants(a(CHEBI:"(S)-3-hydroxy-3-methylglutaryl-CoA"),a(CHEBI:NADPH), a(CHEBI:hydron)),\
 products(a(CHEBI:mevalonate), a(CHEBI:"CoA-SH"), a(CHEBI:"NADP(+)"))) subProcessOf\
 bp(GOBP:"cholesterol biosynthetic process")

3.5. Deprecated Relationships

Warning

These BEL v1.0 relationships are supported in BEL v2.0, but are slated to be removed in the next major version.

3.5.1. analogous

For terms A and B, A analogousTo B indicates that A and B represent abundances or molecular activities which function in a similar manner, but do not share sequence similarity or a common ancestor.

3.5.2. biomarkerFor

For term A and process term P, A biomarkerFor P indicates that changes in or detection of A is used in some way to be a biomarker for pathology or biological process P.

3.5.3. prognosticBiomarkerFor

For term A and process term P, A prognosticBiomarkerFor P indicates that changes in or detection of A is used in some way to be a prognostic biomarker for the subsequent development of pathology or biological process P.

4. Appendices

Additional information supporting the BEL Language specification.

4.1. Namespaces Used in Examples

Namespaces are a reference to the specific vocabulary that a value used in a BEL Term comes from. The examples in this documentation use the following set of BEL Namespaces (v20131211) to reference external ontologies and vocabularies:

Namespace Abbreviation

Namespace Description

EGID

Entrez Gene IDs

HGNC

HGNC human gene symbols

MGI

MGI mouse gene symbols

RGD

RGD rat gene symbols

SP

SwissProt accession numbers

MESHD

Medical Subject Heading Disease names

MESHCS

Medical Subject Heading Cellular Structure names

MESHPP

Medical Subject Heading Process names

CHEBI

Chemicals of Biological Interest names

GOBP

Gene Ontology Biological Process names

GOCC

Gene Ontology Cellular Component names

SCOMP

Selventa Named Complexes

SFAM

Selventa Protein Families

4.2. BEL Examples

The following pages contain examples of BEL Terms and BEL Statements. BEL Terms are used to represent biological entities including abundances and processes. These terms are used as the basis of BEL Statements that link one or more BEL Terms together with a relationship and/or additional context information to represent biological knowledge.

These examples are written in BEL Script format; see documentation for more information.

4.2.1. BEL Term Examples

Abundance Term Examples

Measurable entities like genes, RNAs, proteins, and small molecules are represented as abundances in BEL. BEL Terms for abundances have the general form a(ns:v), where a is an abundance function, ns is a namespace reference and v is a value from the namespace vocabulary. See Namespaces Used in Examples.

Chemicals and Small Molecules

The general abundance function abundance() is used to represent abundances of chemicals, small molecules, and any other entities that cannot be represented by a more specific abundance function.

Examples
Long Form
abundance(CHEBI:"nitrogen atom")
abundance(CHEBI:"prostaglandin J2")
Short Form
a(CHEBI:"nitrogen atom")
a(CHEBI:"prostaglandin J2")

These BEL Terms represent the abundance of the entities specified by nitrogen atom and by prostaglandin J2 in the CHEBI namespace.

Genes, RNAs, and proteins

The abundance functions geneAbundance(), rnaAbundance(), and proteinAbundance() are used with namespace values like HGNC human gene symbols, EntrezGene IDs, SwissProt accession numbers to designate the type of molecule represented.

Examples

Abundances of the gene, RNA, and protein encoded by the human AKT1 gene are represented as:

Long Form
geneAbundance(HGNC:AKT1)
rnaAbundance(HGNC:AKT1)
proteinAbundance(HGNC:AKT1)
Short Form
g(HGNC:AKT1)
r(HGNC:AKT1)
p(HGNC:AKT1)

These BEL Terms represent the gene, RNA, and protein abundances of the entity specified by AKT1 in the HGNC namespace. Equivalent terms can be constructed using a corresponding value from a different namespace. For example, the abundance of the human AKT1 RNA can also be represented by referencing the EntrezGene ID or SwissProt accession namespaces:

r(EGID:207)
r(SP:P31749)

The BEL Framework identifies and merges corresponding terms created using different namespaces into a single term through namespace equivalencing.

Protein families

Protein families are used to represent a group of functionally similar proteins. For example, AKT1, AKT2, and AKT3 together form the AKT family. Like other proteins, abundances of protein families are represented using the proteinAbundance() function, with namespace values from the Selventa named protein families namespace.

Example

This term represents the protein abundance of the AKT protein family.

p(SFAM:"AKT Family")
microRNAs

The abundance function microRNAAbundance() is used to represent the fully processed, active form of a microRNA. The specific abundance functions allow distinct representations of the gene, RNA, and microRNA abundances for a given namespace value.

Example

These BEL Terms represent the abundances of the gene, RNA, and processed microRNA, respectively, for the entity specified by Mir21 in the MGI mouse gene symbol namespace.

Long Form
geneAbundance(MGI:Mir21)
rnaAbundance(MGI:Mir21)
microRNAAbundance(MGI:Mir21)
Short Form
g(MGI:Mir21)
r(MGI:Mir21)
m(MGI:Mir21)
Complexes

The abundances of molecular complexes are represented using the complexAbundance() function. This function can take either a list of abundance terms or a value from a namespace of molecular complexes as its argument.

Example

Both BEL Terms represent the IkappaB kinase complex. The first by referencing a named protein complex within the GO Cellular Component namespace, and the second by enumerating the individual protein abundances that compose the IkappaB kinase complex, CHUK, IKBKB, and IKBKG.

Long Form
complexAbundance(GOCC:"IkappaB kinase complex")
complexAbundance(proteinAbundance(HGNC:CHUK), proteinAbundance(HGNC:IKBKB), proteinAbundance(HGNC:IKBKG))
Short Form
complex(GOCC:"IkappaB kinase complex")
complex(p(HGNC:CHUK), p(HGNC:IKBKB), p(HGNC:IKBKG))
Composite abundances

Multiple abundance terms can be represented together as the subject of a BEL Statement by using the compositeAbundance() function. This function takes a list of abundances as its argument and is used when the individual abundances do not act alone, but rather synergize to produce an effect.

Example

This term represents the combined abundances of TGFB1 and IL6 proteins.

Long Form
compositeAbundance(proteinAbundance(HGNC:TGFB1), proteinAbundance(HGNC:IL6))
Short Form
composite(p(HGNC:TGFB1), p(HGNC:IL6))
Activity Term Examples

Term activity functions are applied to protein, complex, and RNA abundances to specify the frequency of events resulting from the molecular activity of the abundance. This distinction is particularly useful for proteins whose activities are regulated by post-translational modification. Specific activity types can be indicated using the molecularActivity() process modifier function. The default BEL namespace includes molecular activity values corresponding to the BEL v1.0 activity functions, and GO Molecular Function namespace values can be used to indicate more specific molecular activities.

Non-Specified Activities

If the type of molecular activity is not reported, it does not need to be specified. The activity() function is sufficient for distinguishing the frequency of events mediated by an abundance from the amount of the abundance. This term represents the ligand-bound activity of the human non-catalytic receptor protein TLR7.

Long Form
activity(proteinAbundance(HGNC:TLR7))
Short Form
act(p(HGNC:TLR7))
Catalytic Activity

A protein, complex, or ribozymes has catalytic activity when it acts as an enzymatic catalyst of biochemical reactions. Catalytic activity includes kinase, phosphatase, peptidase, and ADP-ribosylase activities, though these can be represented by more specific molecular activity terms.

This term represents the frequency of events in which the protein abundance of rat Sod1 acts as a catalyst.

Long Form - default BEL namespace
activity(proteinAbundance(RGD:Sod1), ma(cat))
Long Form - GO Molecular Function (GOMF) namespace
activity(proteinAbundance(RGD:Sod1), molecularActivity(GOMF:"catalytic activity"))
Short Form - default BEL namespace
act(p(RGD:Sod1), ma(cat))
short Form - GO Molecular Function namespace
act(p(RGD:Sod1), ma(GOMF:"catalytic activity"))
Peptidase Activity

This term represents the frequency of events in which the protein abundance of mouse Casp3 acts as a peptidase.The more specific GO Molecular Function term "cysteine-type endopeptidase activity" is also applicable.

Long Form - default BEL namespace
activity(proteinAbundance(MGI:Casp3), molecularActivity(pep))
Long Form - GO Molecular Function namespace
activity(proteinAbundance(MGI:Casp3), molecularActivity(GOMF:"peptidase activity"))
Short Form - default BEL namespace
act(p(MGI:Casp3), ma(pep))
Short Form - GO Molecular Function namespace
act(p(MGI:Casp3), ma(GOMF:"peptidase activity"))
G-proteins in the active (GTP-bound) state

The activity of guanine nucleotide-binding proteins (G-proteins) like RAS in the active, GTP-bound state. This term represents the frequency of events caused by the active, GTP-bound form of the RAS protein family.

Long Form - default BEL namespace
activity(proteinAbundance(SFAM:"RAS Family"), molecularActivity(gtp))
Long Form - GO Molecular Function namespace
activity(proteinAbundance(SFAM:"RAS Family"), molecularActivity(GOMF:"GTP binding"))
Short Form - default BEL namespace
act(p(SFAM:"RAS Family"), ma(gtp))
Short Form - GO Molecular Function namespace
act(p(SFAM:"RAS Family"), ma(GOMF:"GTP binding"))
Transporter Activity

Molecular translocation events mediated by transporter proteins like ion channels or glucose transporters. This term represents the frequency of ion transport events mediated by the epithelial sodium channel (ENaC) complex.

Long Form - default BEL namespace
activity(complexAbundance(SCOMP:"ENaC Complex"), molecularActivity(tport))
Long Form - GO Molecular Function namespace
activity(complexAbundance(SCOMP:"ENaC Complex"), molecularActivity(GOMF:"transporter activity"))
Short Form - default BEL namespace
act(complex(NCH:"ENaC Complex"), ma(tport))
Short Form - GO Molecular Function namespace
act(complex(NCH:"ENaC Complex"), ma(GOMF:"transporter activity"))
Chaperone Activity

This term represents the events in which the human Calnexin protein functions as a chaperone to aid the folding of other proteins.

Long Form - default BEL namespace
activity(proteinAbundance(HGNC:CANX), molecularActivity(chap))
Short Form - default BEL namespace
act(p(HGNC:CANX), ma(chap))
Transcription Activity

Events in which a protein or molecular complex acts to directly control transcription, including proteins acting directly as transcription factors, as well as transcriptional co-activators and co-repressors. This term represents the frequency of events in which the mouse p53 protein controls RNA expression.

Long Form - default BEL namespace
activity(proteinAbundance(MGI:Trp53), molecularActivity(tscript))
Long Form - GO Molecular Function Namespace
activity(proteinAbundance(MGI:Trp53), molecularActivity(GOMF:"nucleic acid binding transcription factor activity"))
Short Form - default BEL namespace
act(p(MGI:Trp53), ma(tscript))
Long Form - GO Molecular Function Namespace
act(p(MGI:Trp53), ma(GOMF:"nucleic acid binding transcription factor activity"))
Binding Interaction Term Examples

The complexAbundance() function can be used to specify molecular interactions between abundances. This function can take either a list of abundances that define a molecular complex or a namespace value that represents a molecular complex (e.g., many GO Cellular Component values) as an argument. These examples demonstrate the use of the complexAbundance() function to represent protein-protein, protein-chemical, and protein-DNA interactions.

Protein – protein interactions
Example - protein-protein interaction as BEL statement

This statement represents that MTOR and AKT1S1 proteins physically interact. Note that this statement has only an object term and no subject term and relationship.

Long Form
SET Citation = {"PubMed", "Nat Cell Biol 2007 Mar 9(3) 316-23", "17277771"}
SET SupportingText = "Here, we identify PRAS40 (proline-rich Akt/PKB substrate
 40 kDa) as a novel mTOR binding partner"
# disambiguation PRAS40 = HGNC AKT1S1
complexAbundance(proteinAbundance(HGNC:AKT1S1), proteinAbundance(HGNC:MTOR))
Short Form
complex(p(HGNC:AKT1S1), p(HGNC:MTOR))
Example - protein-protein interaction as Statement object

Here, a protein-protein interaction is the object of a BEL Statement.This statement expresses that the MTOR and STAT3 proteins associate and that increases in the protein abundance of BMP4 can increase the abundance of the complex comprised of MTOR and STAT3.

Long Form
SET Citation = {"PubMed", "J Cell Biol. 2003 Jun 9;161(5):911-21.", "12796477"}
SET SupportingText = "Upon BMP4 treatment, the serine-threonine kinase
FKBP12/rapamycin-associated protein (FRAP), mammalian target of
rapamycin (mTOR), associates with Stat3 and facilitates STAT activation."
proteinAbundance(HGNC:BMP4) increases complexAbundance(proteinAbundance(HGNC:MTOR), proteinAbundance(HGNC:STAT3))
Short Form
p(HGNC:BMP4) -> complex(p(HGNC:MTOR), p(HGNC:STAT3))
Protein – DNA interactions
Example - transcription factor protein binding to DNA

This statement expresses that STAT3 protein binds to the CCL11 gene DNA, and that this association is increased by IL17A.

Long Form
SET Citation = {"PubMed", "J Immunol 2009 Mar 15 182(6) 3357-65", "19265112"}
SET SupportingText = "IL-17A induced at 1 h a marked enrichment of
 STAT3- associated CCL11 promoter DNA"
proteinAbundance(HGNC:IL17A) increases \
 complexAbundance(proteinAbundance(HGNC:STAT3), geneAbundance(HGNC:CCL11))
Short Form
p(HGNC:IL17A) -> complex(p(HGNC:STAT3), g(HGNC:CCL11))
Protein – small molecule interactions
Example - protein binding to a small molecule

This statement represents that PIP3 binds AKT1 protein.

Long Form
SET Citation = {"PubMed", "Breast Cancer Res 2005 7(4) R394-401", "15987444"}
SET Evidence = "After PIP3 binding, Akt1 is activated"
# disambiguation PIP3 = CHEBI 1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate
complexAbundance(abundance(CHEBI:"1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate"), proteinAbundance(HGNC:AKT1))
Short Form
complex(a(CHEBI:"1-phosphatidyl-1D-myo-inositol 3,4,5-trisphosphate"), p(HGNC:AKT1))
Biological Processes and Pathologies Term Examples

Biological phenomena that occur at the level of the cell or the organism are considered processes. These terms are represented by values from namespaces like GO and MeSH.

Biological Processes

Cellular senescence can be represented by:

Long Form
biologicalProcess(GOBP:"cellular senescence")
Short Form
bp(GOBP:"cellular senescence")
Diseases and Pathologies

Disease pathologies like muscle hypotonia can be represented by:

Long Form
pathology(MESHD:"Muscle Hypotonia")
Short Form
path(MESHD:"Muscle Hypotonia")
Post-Translationally Modified Protein Term Examples

The proteinModification() or pmod() function is used within a protein abundance to specify post-translational modifications. Types of post-translational modification are specified by a namespace value; the default BEL namespace provides many commonly used modification types. Abundances of modified proteins take the form p(ns:v, pmod(ns:type_value, <code>, <pos>)), where <type> (required) is the kind of modification, <code> (optional) is the one- or three- letter amino acid code for the modified residue, and <pos> (optional) is the sequence position of the modification.

Hydroxylation

This term represents the abundance of human HIF1A protein hydroxylated at asparagine 803.

Long Form
proteinAbundance(HGNC:HIF1A, proteinModification(Hy, Asn, 803))
Short Form
p(HGNC:HIF1A, pmod(Hy, N, 803))
Phosphorylation

This term represents the phosphorylation of the human AKT protein family at an unspecified amino acid residue.

p(SFAM:"AKT Family", pmod(Ph))
Acetylation

This term represents the abundance of mouse RELA protein acetylated at lysine 315.

p(MGI:Rela, pmod(Ac, Lys, 315))
Glycosylation

This term represents the abundance of human SP1 protein glycosylated at an unspecified amino acid residue.

p(HGNC:SP1, pmod(Glyco))
Methylation

This term represents the abundance of rat STAT1 protein methylated at an unspecified arginine residue:

p(RGD:STAT1, pmod(Me, Arg))
Ubiquitination

This term represents the abundance of human MYC protein ubiquitinated at an unspecified lysine residue:

p(HGNC:MYC, pmod(Ub, Lys))
Transformation Term Examples (Reactions, Translocations, Degradation)
Reactions

The reaction() or rxn() function expresses the transformation of products into reactants, each defined by a list of abundances.

Example

This BEL Term represents the reaction in which the reactants phosphoenolpyruvate and ADP are converted into pyruvate and ATP.

Long Form
reaction(reactants(abundance(CHEBI:phosphoenolpyruvate), abundance(CHEBI:ADP)),\
 products(abundance(CHEBI:pyruvate), abundance(CHEBI:ATP)))
Short Form
rxn(reactants(a(CHEBI:phophoenolpyruvate), a(CHEBI:ADP)),\
 products(a(CHEBI:pyruvate), a(CHEBI:ATP)))
Translocations

Translocations, or the movement of abundances from one location to another, are represented in BEL Terms by the translocation() or tloc() function. For convenience, the frequently used translocations of abundances from inside the cell to cell surface or extracellular space are represented by the cellSurface() and cellSecretion() functions, respectively.

Example

This term represents the event in which human NFE2L2 protein is translocated from the cytoplasm to the nucleus.

Long Form
translocation(proteinAbundance(HGNC:NFE2L2), fromLoc(MESHCS:Cytoplasm), toLoc(MESHCS:"Cell Nucleus"))
Short Form
tloc(p(HGNC:NFE2L2), fromLoc(MESHCL:Cytoplasm), toLoc(MESHCL:"Cell Nucleus"))
Example - cell secretion

This term represents secretion of mouse IL6 protein.

Long Form
cellSecretion(proteinAbundance(MGI:Il6))
Short Form
sec(p(MGI:Il6))
Example - cell surface expression

This term represents cell surface expression of rat Fas protein.

Long Form
cellSurfaceExpression(proteinAbundance(RGD:Fas))
Short Form
surf(p(RGD:Fas))
Degradation

Events in which an abundance is degraded can be represented by the degradation() or deg() function.

Example

This term represents the degradation of MYC RNA. Degradation decreases the amount of the abundance - when degradation statements are compiled, a directlyDecreases relationship edge is added between the degradation term and the degraded entity.

Long Form
degradation(rnaAbundance(HGNC:MYC))
Short Form
deg(r(HGNC:MYC))
Variant (Mutant) Protein Examples

The abundances of mutated and variant proteins can be represented in BEL using the abundance modifier function variant("") and the other function fusion().

Amino Acid Substitutions

The abundances of proteins with amino acid sequence variations, such as those resulting from missense mutations or polymorphisms can be specified by using the variant("") or var("") function within a protein abundance term.

Example
Long Form
proteinAbundance(HGNC:PIK3CA, variant("p.Glu545Lys"))
Short Form
p(HGNC:PIK3CA, var("p.Glu545Lys"))

This term represents the abundance of the human PIK3CA protein in which the glutamic acid residue at position 545 has been substituted with a lysine.

Truncated Proteins

The abundances of proteins that are truncated by the introduction of a stop codon can be specified by using the variant("") or var("") function within a protein abundance term.

Example
Long Form
proteinAbundance(HGNC:ABCA1, variant("p.Arg1851*"))
Short Form
p(HGNC:ABCA1, var("p.Arg1851*"))

This term represents the abundance of human ABCA1 protein that has been truncated by substitution of Arginine 1851 with a stop codon.

Fusion Proteins

The abundances of fusion proteins resulting from chromosomal translocation mutations can be specified by using the fusion() or fus() function within a protein abundance term.

Example
Long Form
proteinAbundance(fusion(HGNC:BCR, "p.1_426", HGNC:JAK2, "p.812_1132"))
Short Form
p(fus(HGNC:BCR, "p.1_426", HGNC:JAK2, "p.812_1132"))

This term represents the abundance of a fusion protein of the 5' partner BCR and 3' partner JAK2, with the breakpoint for BCR at amino acid 426 and JAK2 at 812. p. indicates that the protein sequence is used for the range coordinates provided. If the breakpoint is not specified, the fusion protein abundance can be represented as:

p(fus(HGNC:BCR, "?", HGNC:JAK2, "?"))

The fusion() function can also be used within geneAbundance and rnaAbundance terms to represent genes and RNAs modified by fusion mutations.

4.2.2. BEL Statement Examples

Causal Statement Examples

Causal statements connect subject and object terms with a causal increases, decreases, or causesNoChange relationship. Subject terms can be an abundance or process (including activities and transformations) and object terms can be either an abundance, a process, or a second BEL Statement.

Causal increase
Example

These statements use the causal increases relationship. These statements are annotated with a citation and supporting evidence text, as well as with the cell line and species context for the experimental observations represented by the statements. These two statements represent the observation that increases in IL6 protein abundance cause increases in the RNA abundance of ENO1 and XBP1. These statements are annotated with CellLine and Species to indicate that the experimental observation was made in the context of the cell line "U266" and species "9606" (Homo sapiens).

Long Form
SET Citation = {"PubMed", "Int J Oncol 1999 Jul 15(1) 173-8", "10375612"}
SET SupportingText = "Northern blot analysis documented that two
 transcription factor genes chosen for further study, c-myc
 promoter-binding protein (MBP-1) and X-box binding protein 1
 (XBP-1), were up-regulated in U266 cells about 3-fold relative
 to the cell cycle-dependent beta-actin gene 12 h after IL-6
 treatment"
SET CellLine = "U266"
SET Species = "9606"
# disambiguation MBP-1 = HNGC ENO1
proteinAbundance(HGNC:IL6) increases rnaAbundance(HGNC:ENO1)
proteinAbundance(HGNC:IL6) increases rnaAbundance(HGNC:XBP1)
Short Form
p(HGNC:IL6) -> r(HGNC:ENO1)
p(HGNC:IL6) -> r(HGNC:XBP1)
Causal decrease
Example

This statement demonstrates a causal statement using the decreases relationship. The statement expresses that increases in the abundance of corticosteroid molecules cause decreases in the frequency or intensity of the biological process inflammation. This statement is annotated with an Anatomy and Disease to indicate that the relationship was observed in the context of the cardiovascular system and the disease Stroke.

Long Form
SET Citation = {"PubMed", "J Mol Med. 2003 Mar;81(3):168-74. Epub 2003 Mar 14.", "12682725"}
SET SupportingText = "high-dose steroid treatment decreases vascular
 inflammation and ischemic tissue damage after myocardial
 infarction and stroke through direct vascular effects involving
 the nontranscriptional activation of eNOS"
SET Anatomy = "cardiovascular system"
SET MeSHDisease = "Stroke"
abundance(CHEBI:corticosteroid) decreases biologicalProcess(MESHD:Inflammation)
Short Form
a(CHEBI:corticosteroid) -| path(MESHD:Inflammation)
Causes no change

The causesNoChange relationship can be used to record the lack of an observed effect.

Example

The epidermal growth factor receptor (EGFR) ligand amphiregulin (AREG) is observed to increase NF-kappaB transcriptional activity while the EGFR ligand EGF has no effect.These statements express that an increase of AREG protein abundance causes an observed increase in the transcriptional activity of the NF-kappaB complex, and that an increase EGF does not.

Long Form
SET Citation = {"PubMed", "Mol Cancer Res 2007 Aug 5(8) 847-61", "17670913"}
SET SupportingText = "Furthermore, EGFR, activated by amphiregulin but not
 epidermal growth factor, results in the prompt activation of the
 transcription factor nuclear factor-kappaB (NF-kappaB)"
# disambiguation Amphiregulin = HGNC AREG
proteinAbundance(HGNC:AREG) increases activity(complexAbundance(GOCC:"NF-kappaB complex"), molecularActivity(tscript))
proteinAbundance(HGNC:EGF) causesNoChange activity(complexAbundance(GOCC:"NF-kappaB complex"), molecularActivity(tscript))
Short Form
p(HGNC:AREG) -> act(complex(GOCC:"NF-kappaB complex"), ma(tscript))
p(HGNC:EGF) causesNoChange act(complex(GOCC:"NF-kappaB complex"), ma(tscript))
Correlative Statement Examples

Correlative Relationships link abundances and biological processes when no causal relationship is known.

negativeCorrelation

This statement expresses that an increase in cytoplasmic FGF2 protein positively correlates with an increase in the pathology Chronic Obstructive Pulmonary Disease. The subject and object terms of correlative statements are interchangeable. The negativeCorrelation relationship is used to represent inverse correlative relationships, i.e., a decrease in A is correlated with an increase in B.

SET Citation = {"PubMed", "J Pathol. 2005 May;206(1):28-38.", "15772985"}
SET SupportingText = "Quantitative digital image analysis revealed
increased cytoplasmic expression of FGF-2 in bronchial epithelium
(0.35 +/- 0.03 vs 0.20 +/- 0.04, p < 0.008) and nuclear
localization in ASM (p < 0.0001) in COPD patients compared with
controls."
SET Tissue = "epithelium"
proteinAbundance(HGNC:FGF2, location(GOCC:cytoplasm)) positiveCorrelation \
 pathology(MESHD:"Pulmonary Disease, Chronic Obstructive")
association

The direction of causal effect or correlation of two abundance or biological process terms is not always specified. The association relationship can be used in these cases.

This statement represents that abundance of protein designated by the name Nr2f2 in the MGI namespace is associated in an unspecified manner with the biological process angiogenesis.

Long Form
SET Citation = {"PubMed", "Mech Ageing Dev. 2004 Oct-Nov;125(10-11):719-32.", "15541767"}
SET SupportingText = "COUP-TFII is involved in the angiogenic process in the developing embryos."
# disambiguation - COUP-TFII refers to MGI Nr2f2
SET MeSHAnatomy = "Embryo, Mammalian"
proteinAbundance(MGI:Nr2f2) association biologicalProcess(GOBP:angiogenesis)
Short Form
p(MGI:NR2F2) -- bp(GOBP:angiogenesis)
Direct Causal Statement Examples

The following examples demonstrate the use of direct casual relationships in causal statements. The direct causal relationships directlyIncreases and directlyDecreases are special forms of the causal increases and decreases relationships where the mechanism of the causal relationship involves the physical interaction of entities related to the BEL Statement subject and object terms.

Example - Ligand and Receptor

In this example, the directlyIncreases relationship is used to represent activation of a receptor by its ligand. This statement expresses that amphiregulin (AREG) activates its receptor, the Epidermal Growth Factor Receptor (EGFR). This relationship is direct because ligands directly interact with their receptors.

Long Form
SET Citation = {"PubMed", "Mol Cancer Res 2007 Aug 5(8) 847-61", "17670913"}
SET SupportingText = "Furthermore, EGFR, activated by amphiregulin"
# disambiguation Amphiregulin = HGNC AREG
# EGFR is known to have kinase activity
proteinAbundance(HGNC:AREG) directlyIncreases activity(proteinAbundance(HGNC:EGFR), molecularActivity(kin))
Short Form
p(HGNC:AREG) => act(p(HGNC:EGFR), ma(kin))
Example - Kinase and Substrate

In this example, the directlyIncreases relationship is used to represent the phosphorylation of a protein substrate by a kinase. This statement expresses that the kinase activity of CDK1 protein causes an increase in the modification of FOXO1 protein by phosphorylation at serine 249. The relationship is direct because the kinase physically interacts with its target.

Long Form
SET Citation = {"PubMed", "Science 2008 Mar 21 319(5870) 1665-8.", "18356527"}
SET SupportingText = "We found that Cdk1 phosphorylated the
 transcription factor FOXO1 at Ser249 in vitro and in vivo."
activity(proteinAbundance(HGNC:CDK1), molecularActivity(kin)) directlyIncreases \
 proteinAbundance(HGNC:FOXO1, proteinModification(Ph, Ser, 249))
Short Form
act(p(HGNC:CDK1), ma(kin)) => p(HGNC:FOXO1, pmod(Ph, S, 249))
Example - Catalyst and Reaction

In this example, the direct activation of a reaction by a catalytic enzyme is represented. The statement indicates that an increase in the catalytic activity of ALOX5 increase the transformation of the reactant '5(S)-HPETE' to the products 'leukotriene A4' and 'water'. The relationship is considered direct because ALOX5 protein is the catalyzing enzyme.

Long Form
SET Citation = {"Other", "Reactome: Leukotriene synthesis", "REACT_15354.1"}
SET SupportingText = "Dehydration of 5-HpETE to leukotriene A4. In the
 second step, 5-lipoxygenase converts 5-HpETE to an allylic
 epoxide, leukotriene A4."
activity(proteinAbundance(HGNC:ALOX5), molecularActivity(cat)) directlyIncreases \
 reaction(reactants(abundance(CHEBI:"5(S)-HPETE")), \
 products(abundance(CHEBI:"leukotriene A4"), abundance(CHEBI:water)))
Short Form
act(p(HGNC:ALOX5), ma(cat)) => rxn(reactants(a(CHEBI:"5(S)-HPETE")), products(a(CHEBI:"leukotriene A4"), a(CHEBI:water)))
Example - Self-Referential Relationships

In this example, the directlyDecreases relationship is used to represent the effect of a protein modification on the activity of the same protein. These statements express that the modification of GSK3A and GSK3B protein by phosphorylation on serines 9 and 21, respectively, inhibits the activity of GSK3A and GSK3B. These relationships are considered direct, because they are self-referential. The modification of the protein abundance by phosphorylation inhibits the activity of the same protein abundance.

Long Form
SET Citation = {"PubMed", "Proc Natl Acad Sci U S A 2000 Oct 24 97(22) 11960-5", "11035810"}
SET SupportingText = "GSK-3 activity is inhibited through phosphorylation
 of serine 21 in GSK-3 alpha and serine 9 in GSK-3 beta."
proteinAbundance(HGNC:GSK3A, proteinModification(Ph, Ser, 21)) \
 directlyDecreases activity(proteinAbundance(HGNC:GSK3A))
proteinAbundance(HGNC:GSK3B, proteinModification(Ph, Ser, 9)) \
 directlyDecreases activity(proteinAbundance(HGNC:GSK3B))
Short Form
p(HGNC:GSK3A, pmod(Ph, S, 21)) =| act(p(HGNC:GSK3A))
p(HGNC:GSK3B, pmod(Ph, S, 9)) =| act(p(HGNC:GSK3B))
Example - Direct Transcriptional Control

In this example, the direct activation of a RNA transcription is encoded. The statement expresses that increases in the transcriptional activity of FOXO1 protein directly increase the RNA abundance of CEBPB. This relationship is considered direct because the transcription factor, FOXO1, directly binds the promoter of the CEBPB gene, increasing the expression of CEBPB RNA.

Long Form
SET Citation = {"PubMed", "Biochem Biophys Res Commun. 2009 Jan 9;378(2):290-5. Epub 2008 Nov 21.", "19026986"}
SET SupportingText = "We found that Foxo1 increased the expression of
 CCAAT/enhancer binding protein (C/EBPbeta, a positive regulator
 of monocyte chemoattractant protein (MCP)-1 and interleukin
 (IL)-6 genes, through directly binding to its promoter."
activity(proteinAbundance(HGNC:FOXO1), molecularActivity(tscript)) \
 directlyIncreases rnaAbundance(HGNC:CEBPB)
Short Form
act(p(HGNC:FOXO1), ma(tscript)) => r(HGNC:CEBPB)
Nested Statement Example

This example demonstrates use of a nested causal statement in which the object of a causal statement is itself a causal statement. In the relationship described by the evidence text, CLSPN specifically increases the activity of ATR to phosphorylate the target protein CHEK1 and does not affect the kinase activity of ATR towards its other targets. The use of the nested statement allows the representation of the information that CLSPN increases the phosphorylation of CHEK1 via the kinase activity of ATR, without incorrectly indicating that CLSPN generally increases the kinase activity of ATR.

Long Form
SET Citation = {"PubMed", "Mol Cell Biol 2006 Aug 26(16) 6056-64.", "16880517"}
SET Species = "9606"
SET SupportingText = "Consistently, the RNAi-mediated ablation of Claspin
 selectively abrogated ATR's ability to phosphorylate Chk1 but not
 other ATR targets."
proteinAbundance(HGNC:CLSPN) increases \
(activity(proteinAbundance(HGNC:ATR), molecularActivity(kin)) directlyIncreases proteinAbundance(HGNC:CHEK1, proteinModification(Ph)))
Short Form
p(HGNC:CLSPN) -> (act(p(HGNC:ATR), ma(kin)) => p(HGNC:CHEK1, pmod(Ph)))

4.2.3. Other Examples

BEL Statement Annotation Examples

Annotations associate context information with BEL Statements, including citation of the source material, evidence text supporting the statement, and the experimental context for the scientific observations represented by the statement. To associate Annotations with statements, Annotations are SET and UNSET within a BEL Document. In the BEL Script syntax, once an Annotation has been SET all following statements inherit the annotation until is explicitly UNSET or a new Annotation of the same type is SET.

Citation

Citations are a special type of annotation that references the knowledge source that reports the observation that the statement is based on. Citations are composed of a document type, a document name, a document reference ID, and an optional publication date, authors list and comment field. For example, the citation for a journal article indexed by PubMed would be encoded as:

SET Citation = {"PubMed", "Genes Cancer. 2010 Jun;1(6):560-567.", "21533016"}

The document name is a text string containing the reference information, the type is PubMed, and the document reference is the PubMed ID.

The citation for a Reactome pathway would be encoded as:

SET Citation = {"Online Resource", "p53-Dependent G1 DNA Damage Response", "REACT_1625.1"}

In this case, the document name is the pathway name, the type is Online Resource, and the reference is the Reactome identifier.

Support (previously known as Supporting Text)

Support annotations provide the specific text that the statement is derived from. Text should come directly from the abstract or full text of the source referenced by the citation annotation. For example, a support line from the Reactome pathway cited above is:

SET Supporting = "The p53 protein activates the transcription of cyclin-dependent kinase inhibitor, p21.
p21 inactivates the CyclinE:Cdk2 complexes, and prevent entry of the cell into S phase, leading to G1 arrest."
Species

Species annotations indicate the species context for experimental observation represented by the statement. It is good practice to unambiguously assign species context to BEL Statements, even though many BEL Terms are derived from a species-specific namespace (e.g., HGNC, MGI, RGD). Species annotation uses the NCBI taxonomy ID:

SET Species = "9606"

Sets the species as Homo sapiens.

SET Species = "10090"

Sets the species as Mus musculus

SET Species = "10116"

Sets the species as Rattus norvegicus.

Other Annotation Types

Other types of annotations can be added to statements to indicate the context of the experimental observation supported by the statement, including cell line, cell type, and cellular location. For example:

SET Cell = "Adipocytes, White"
SET CellLine = "LoVo"
SET Disease = "Lupus Erythematosus, Systemic"
SET Anatomy = "Pulmonary Artery"
Tip

In a BEL Document each Annotation Type that will be used, except for Citation and SupportingText, must be defined in the document header, along with the values allowed for each.

Membership Assignment Examples

These examples demonstrate the assignment of members to groups. Because all BEL terms denote classes, membership in a group is an important special case where subsets of a class that define the class are designated.

Tip

The BEL Framework adds family members to protein families and complex components to named complexes during network compilation.

Protein Family

In this example, members of a protein family are assigned using the hasMember and hasMembers relationships.

The hasMembers relationship is used to assign a list of protein abundances as members of a protein family. This relationship is a syntactic convenience that is equivalent to the set of two statements using the hasMember relationship. These statements designate the protein abundances of MAPK8 and MAPK9 as members of the JNK MAPK protein family. The term representing the JNK family is a protein abundance based on the name MAPK JNK Family in the Selventa Protein Families namespace.

p(SFAM:"MAPK JNK Family") hasMembers list(p(HGNC:MAPK8), p(HGNC:MAPK9))

The hasMember relationship is used to assign individual protein abundances to a protein family.

p(SFAM:"MAPK JNK Family") hasMember p(HGNC:MAPK8)
p(SFAM:"MAPK JNK Family") hasMember p(HGNC:MAPK9)
Complex Component

In this example components are assigned to a named protein complex using the hasComponent and hasComponents relationships.

The hasComponents relationship is similar to the hasMembers relationship and is used to assign a list of abundances as components of a complex.These statements designate the protein abundances of RAD9A, RAD1, and HUS1 as components of the complex abundance of the checkpoint clamp complex.

complex(GOCC:"checkpoint clamp complex") hasComponents list(p(HGNC:RAD9A), p(HGNC:RAD1), p(HGNC:HUS1))

The hasComponent relationship is used to assign individual abundances to a named protein complex.

complex(GOCC:"checkpoint clamp complex") hasComponent p(HGNC:RAD9A)
complex(GOCC:"checkpoint clamp complex") hasComponent p(HGNC:RAD1)
complex(GOCC:"checkpoint clamp complex") hasComponent p(HGNC:HUS1)

The single hasComponents statement is equivalent to the set of three hasComponent statements.

4.3. BEL Best Practices - Updated for BEL v2

These pages contain suggestions and guidelines for representing scientific findings in BEL.

4.3.1. Representation of Experimental Data

In a causal BEL Statement, the subject term frequently represents an experimentally manipulated entity while the object term represents a measured entity. Our best practices apply different levels of inference for mapping subject and object terms, particularly for representing 'omic data.

Subject Terms (Perturbations)
BELv2 How should I represent chemical inhibitor experiments?

For experiments where protein activity is perturbed with a chemical inhibitor, we generally use the chemical as the subject term and not the activity of the target protein. In many cases, the effects of the chemical are not specific to the intended target. This representation approach avoids unintended attribution of off-target effects of a chemical to the target protein.

For example, treatment of cells with the PI3 kinase inhibitor LY294002 significantly decreases expression of TGFB2 RNA (PMID 20629536):

a(SCHEM:"LY 294002") -| r(HGNC:TGFB2)

In a case where more information is available, the protein activity targeted by the inhibitor can be used as the subject term. For example, if the effect of LY 29004 on TGFB2 RNA expression was demonstrated to require the PIK3CA gene, we could represent the subject term as the kinase activity of the PIK3CA protein.

act(p(HGNC:PIK3CA), ma(kin)) -> r(HGNC:TGFB2)
How do I represent experiments that use site-directed mutants?

The artificial (laboratory) creation of sequence variants is often used to investigate the effects of protein activity or specific post-translational modifications. These include proteins altered to be constitutively active or dominant negative, as well as proteins with specific amino acid residues altered to prevent phosphorylation. While many of these sequence alterations can be precisely represented with BEL, this may not be the best approach to capturing the observations from experiments that use these constructs.

Non-phosphorylatable mutant

In this example, mutation of FOXO1 serine 256 to alanine is used to block phosphorylation at 256 (S256A), a site phosphorylated by AKT. The S256A mutation was found to impair phosphorylation of threonine 24 and serine 319 by AKT (PMID 11237865). We could represent this observation as follows:

p(HGNC:FOXO1, var("p.Ser256Ala")) =| p(HGNC:FOXO1, pmod(Ph, Ser, 256))
p(HGNC:FOXO1, var("p.Ser256Ala")) =| (p(SFAM:"AKT Family") => p(HGNC:FOXO1, pmod(Ph, Thr, 24)))
p(HGNC:FOXO1, var("p.Ser256Ala")) =| (p(SFAM:"AKT Family") => p(HGNC:FOXO1, pmod(Ph, Ser, 319)))

The first statement indicates that phosphorylation at S256 is blocked by mutation of S256 to alanine. The next two statements indicate that the S256A mutation decreases phosphorylation of FOXO1 threonine 24 and serine 319 by AKT. However, we are not generally interested in the effects of a lab-created mutant like S256A so much as the role of phosphorylation at serine 256 on phosphorylation of the other two sites. Thus, we recommend the following representation:

p(HGNC:FOXO1, pmod(Ph, Ser, 256)) => (p(SFAM:"AKT Family") => p(HGNC:FOXO1, pmod(Ph, Thr, 24)))
p(HGNC:FOXO1, pmod(Ph, Ser, 256)) => (p(SFAM:"AKT Family") => p(HGNC:FOXO1, pmod(Ph, Ser, 319)))

Here, the statements indicate that phosphorylation of FOXO1 at S256 increases the phosphorylation of T24 and S319 by the kinase activity of AKT. While both representations are accurate, the second version is better suited to integrating other information about the role of FOXO1 phosphorylation at S256 into a cohesive, traversable model.

How do I represent observations resulting from manipulation of two or more entities?

In some cases an experiment has a complex perturbation, where manipulations of multiple biological entities are required for an effect. Multiple BEL abundance terms can be represented together as the subject of a BEL Statement by using the compositeAbundance() or composite() function.

In this example, TGF-beta cooperates with IL-6 to generate T-helper 17 cells (PMID 17918200):

composite(p(MGI:Tgfb1), p(MGI:Il6)) -> bp(GOBP:"T-helper 17 cell differentiation")

If the two manipulated components are known to physically interact (such as a receptor and it’s ligand), we recommend inferring their effects rather than using a composite term.

In this example, both Met and Hgf (the Met ligand) are required for increased expression of integrin Itgav RNA (PMID 16710476):

Important

Not recommended:

composite(p(MGI:Hgf), p(MGI:Met)) -> r(MGI:Itgav)

Recommended:

kin(p(MGI:Met)) -> r(MGI:Itgav)
p(MGI:Hgf) -> r(MGI:Itgav)

Because Hgf binds to and directly activates Met, the effect of Met and Hgf together on Itgav RNA expression can be inferred to result from Met activity.

How should I represent gene knock out or RNAi experiments?

Our general practice is to represent the subject term for experiments where the perturbation is a gene deletion or RNAi knockdown as the abundance of the corresponding protein.

Gene knockouts

In this example, mice with a gene deletion of Nfe2l2 express reduced mRNA of the glutathione S transferase Gsta1 compared to wild-type mice (PMID 11991805):

p(MGI:Nfe2l2) -> r(MGI:Gsta1)
RNA interference

In this example, knockdown of PTEN using RNA interference results in increased CDKN1A protein levels (PMID 17300726):

p(HGNC:PTEN) -| p(HGNC:CDKN1A)

We assume that the effects of PTEN RNAi are due to knock down of PTEN protein. Decreased PTEN protein resulting in increased CDKN1A protein is interpreted as PTEN decreases CDKN1A protein.

It is generally preferable to represent the subject term as the protein abundance and not an activity of the protein, particularly for 'omic experiments. See also When should I use the protein abundance vs. the activity of a protein?

How should I represent overexpression experiments?

For experiments where the perturbation is overexpression of DNA or RNA for the purpose of overexpressing a protein, we generally represent the subject term as a protein abundance.

In this example, SIAH2 and repp86 (TPX2) proteins interact, and overexpression of SIAH2 by transfection increases degradation of TPX2 protein (PMID 17716627):

p(HGNC:SIAH2) => deg(p(HGNC:TPX2))

The statement is modeled as direct because the subject and object term proteins interact.

While it would be technically correct to represent overexpressions achieved via DNA transfection as gene abundances and those from mRNA transfections as RNA abundances, this distinction is not useful for applications like Whistle and pathfinding. It is generally preferable to represent the subject term as the protein abundance and not an activity of the protein, particularly for 'omic experiments, if it is not clear that the activity is required or responsible for the effect.

When should I use the protein abundance vs. the activity of a protein?

Many experimental perturbations involve the overexpression or knockdown of protein abundance. We generally represent this type of experiment using a protein abundance as the subject term instead of an activity (e.g., kinase or phosphatase) of the protein. While many proteins have a known activity, this activity is not always responsible for all downstream effects of overexpression or knockdown; the same effects occur when either wild-type or catalytically inactive forms are expressed.

Below are examples of BEL statements for cases where:

  1. the effects of increased or decreased protein abundance are not due to the catalytic activity of the protein,

  2. effects are likely due to an increase or decrease in the activity of the protein, and

  3. not enough information is available.

Effects are not due to the catalytic activity of the protein

Example 1. A mutant telomerase (TERT) protein lacking telomerase activity retains its effects on keratinocyte proliferation (PMID 18208333):

p(MGI:Tert) -> bp(GOBP:"cell proliferation")
act(p(MGI:Tert), ma(cat)) causesNoChange bp(GOBP:"cell proliferation")

Example 2. The serine-threonine kinase RIPK1 activates NF-kappaB through a mechanism that does not involve the protein’s kinase activity (PMID 20354226).

p(HGNC:RIPK1) -> act(p(GOCC:"NF-kappaB complex"), ma(tscript))
act(p(HGNC:RIPK1), ma(kin)) causesNoChange act(p(GOCC:"NF-kappaB complex"), ma(tscript))
Effects are likely due to the activity of the protein

Example. Genes are differentially expressed in wild-type vs. Met knock out mouse hepatocytes, only after treatment with Hgf, the Met ligand (PMID 16710476). These genes include integrins Itgav, Itga3, and Itgb1.

act(p(MGI:Met), ma(kin)) -> r(MGI:Itgav)
act(p(MGI:Met), ma(kin)) -> r(MGI:Itga3)
act(p(MGI:Met), ma(kin)) -> r(MGI:Itgb1)

Because both Met and the Met ligand are required for the increase in gene expression, we use the activity of Met as the subject term.

Not enough information is available

Example. Genes are differentially expressed in the pancreas of mice with a pancreas-specific beta-catenin deletion (PMID 17222338). These include decreased expression of the hedgehog interacting protein Hhip in the knock-out compared to wild-type.

p(MGI:Ctnnb1) -> r(MGI:Hhip)

In this case, not enough information is available to determine if the change in Hhip expression is due to Ctnnb1 function in transcription, cell adhesion, or another role.

Relationships
When should I use a correlative relationship?

Correlative relationships are more appropriate than causal relationships to represent observations that do not clearly result from an experimental perturbation. BEL causal relationships include increases, decreases, directlyIncreases, directlyDecreases, and regulates. Correlative relationships include positiveCorrelation and negativeCorrelation.

Correlative

If the observation comes from the comparison of human tumors grouped by the occurrence of a specific mutation, then the relationship should generally be expressed as correlative. In this case there is no experimental perturbation. In this example, most patient tumor samples with an EGFR L858R mutation were observed to exhibit a reduction in ERBB2 tyrosine 1248 phosphorylation compared to wild-type samples (PMID 18687633):

p(HGNC:EGFR, var("p.Leu858Arg")) negativeCorrelation p(HGNC:ERBB2, pmod(Ph,Tyr,1248))

In this case, no evidence is presented to suggest that the differences in ERBB2 phosphorylation are causally related to the EGFR mutation, only that the two observations are inversely correlated. Note that the subject and object terms are interchangeable for correlative relationships.

Causal

If the observation comes from the comparison of experimentally controlled states, like gene deletion, overexpression, or introduction of a mutant allele into a cell line or animal, the experimental perturbation can generally be represented as the subject term of a causal statement. In this example, DUSP6 RNA is observed to be upregulated in immortalized human bronchial epithelial cells transfected with EGFR mutant L858R, as compared to WT EGFR (PMID 16489012):

p(HGNC:EGFR, var("p.Leu858Arg")) -> r(HGNC:DUSP6)

In this case, the EGFR mutation is introduced as an experimentally-controlled perturbation.

Object Terms (Measurements)
How should I represent microarray data?

We record the results of experiments like microarrays and RT-PCR, which measure RNA abundances, by representing the object terms as RNA abundances. Only significant effects (e.g., meeting minimum criteria for fold change and statistical significance) should be recorded in BEL Statements.

In a causal BEL Statement, the subject term generally represents an experimentally manipulated entity while the object term represents a measured entity. Our general practice is to represent the object terms in BEL Statements with the terms most closely related to the experimental measurement.

This direct representation of the measurement in BEL supports the creation of KAMs to which 'omic data can be mapped directly and analyzed using automated reasoning applications like Whistle. Inference of the potential downstream consequences of RNA expression changes is supported by connection of RNA abundances to the corresponding proteins during knowledge network compilation

4.3.2. Statement Annotations

How do I annotate a relationship observed in multiple biological contexts?

Often, the scientific literature reports a relationship as occurring across several biological contexts.

Our general practice is to represent each observation with a separate statement. Several annotations can be used to describe the same context, e.g., 'lung' and 'fibroblast', but distinct BEL statements should be used to describe each experimental context that the relationship is observed in.

Example

PMID 18650932 - siRNA knockdown of the atypical PKC-interacting protein Par-4 (PAWR) increases phosphorlyation of AKT at Serine 473 in both human 293 and A549 cells.

"To test whether this is also true in human cells, we used a Par-4 siRNA to deplete endogenous Par-4 levels in human 293 cells and in the A549 human lung adenocarcinoma cell line. Cells were treated with control or Par-4-specific siRNAs, after which they were kept for 24 h in serum-free medium conditions and then stimulated with serum. Data in Figure 5E and F clearly demonstrate that the knockdown of Par-4 provokes enhanced serum-activated phospho-Akt-Ser473 levels in A549 and 293 human cells, respectively."

Important

Not recommended:

SET CellLine = {A549, 293}
p(HGNC:PAWR) -| p(SFAM:"AKT Family", pmod(Ph, Ser, 473))

Recommended:

SET CellLine = A549
p(HGNC:PAWR) -| p(SFAM:"AKT Family", pmod(Ph, Ser, 473))
SET CellLine
p(HGNC:PAWR) -| p(SFAM:"AKT Family", pmod(Ph, Ser, 473))

4.3.3. Modified Proteins

How do I represent a protein modification when specific information is not available?

BEL terms for post-translational modifications of proteins specify the type of modification, the modified amino acid, and the position of the modified amino acid. The modified amino acid and position are not required, so protein modifications can be represented with less specific information.

Example

Human AKT1 protein modified by phosphorylation at serine 473

p(HGNC:AKT1, pmod(Ph, Ser, 473))

Human AKT1 protein modified by phosphorylation at an unspecified serine residue

p(HGNC:AKT1, pmod(Ph, Ser))

Human AKT1 protein that has been modified by phosphorylation at an unspecified amino acid residue

p(HGNC:AKT1, pmod(Ph))

As a general rule, if specific information is available, it should be used. In some cases, this involves investigation sections of a paper outside of the evidence text or other referenced papers to determine which specific modifications have been measured.

Non-specific protein modification terms have limited value in the context of a knowledge network. For example, phosphorylation at different sites of the same protein can have opposing effects. For example: "Akt-phosphorylated FOXO interacts with the ubiquitin ligase Skp2 and is targeted for proteasomal degradation" (PMID 15917664)

Example

Recommended:

act(p(SFAM:"AKT Family"), ma(kin)) => (act(p(HGNC:SKP2)) => deg(p(SFAM:"FOXO Family")))
Important

Not recommended:

p(SFAM:"FOXO Family", pmod(Ph)) => (act(p(HGNC:SKP2)) => deg(p(SFAM:"FOXO Family")))

The first BEL Statement indicates that the kinase activity of AKT increases the degradation of FOXO by SKP2. The second statement indicates that phosphorylation of FOXO increases the degradation of FOXO by SKP2. In this case, more information is captured by using the phosphorylating kinase AKT as the subject term instead of the non-specified phosphorylation of FOXO.

How do I represent a protein modification within a complex?

In many cases, complexes include proteins with post-translational modifications and these modifications influence complex formation. For example, HIF1A that has been hydroxylated on proline residues 402 and 564 interacts with VHL (PMID 17925579).

Our general practice is to represent this type of event as a causal statement in BEL, with the modified protein as the subject term and the complex with no specified modifications as the object term. Because the modified protein is a component of the complex, we use a direct causal relationship:

p(HGNC:HIF1A, pmod(Hy, Pro, 402)) => complex(p(HGNC:HIF1A), p(HGNC:VHL))
p(HGNC:HIF1A, pmod(Hy, Pro, 564)) => complex(p(HGNC:HIF1A), p(HGNC:VHL))
Important

While BEL allows representation of complexes with the modified proteins as components, we do not recommend this approach:

complex(p(HGNC:HIF1A, pmod(Hy, Pro, 402)), p(HGNC:VHL))

The practice of composing a complex using protein abundances without any specified modifications provides a standardized representation for complexes and allows the effects of modifications on complex formation to be captured as causal relationships. The modified forms of HIF1A, p(HGNC:HIF1A, pmod(Hy, Pro, 402)) and p(HGNC:HIF1A, pmod(Hy, Pro, 564)) are considered a subset of the total p(HGNC:HIF1A).

This approach enables representation of the effects of multiple protein modifications on complex formation by using a causal statement for each modification.

Warning

BELv2.0 does not provide a specific representation of unmodified protein abundances.

Exception – modified histones bound to the promoter of a specific gene

One exception to our general practice of not specifying protein modifications within complex abundances is the interaction of specific modified histones with the promoter of a specific gene. In this example, cigarette smoke is observed to increase H3K27me3 levels at the DKK1 promoter (PMID 19351856).

a(SCHEM:"smoke condensate, cigarette (gas phase)") -> \
   complex(p(PFH:"Histone H3 Family",pmod(Me3, Lys, 27)), g(HGNC:DKK1))

In this example, the modification of the histone by trimethylation does not affect its binding to the gene DKK1. In addition, cigarette smoke does not increase or decrease the overall abundance of the modified histone, only the abundance of the modified histone at the DKK1 promoter.

How do I represent a situation where multiple phosphorylations are required for a protein’s activity?

In many cases two or more distinct modifications are required simultaneously for protein activity, and neither modification alone is sufficient. For example, MAPK3 must be phosphorylated at two sites, Threonine 202 and Tyrosine 204, to be active. Our general practice is to take the simple, most general approach, and model the effect of each site separately:

p(HGNC:MAPK3, pmod(Ph, Thr, 202)) => act(p(HGNC:MAPK3), ma(kin))
p(HGNC:MAPK3, pmod(Ph, Tyr, 204)) => act(p(HGNC:MAPK3), ma(kin))

A multiply-modified abundance term can be used if it is of high importance to capture the requirement for both modifications:

p(HGNC:MAPK3, pmod(Ph, Thr, 202)), pmod(Ph, Tyr, 204)) => act(p(HGNC:MAPK3), ma(kin))
How do I represent a situation where one protein modification initiates additional modifications?

In many cases a specific protein modification may be dependent on another modification of the same protein. In this case, the first protein modification can be modeled as the upstream cause of the second. In this example, phosphorylation of CTNNB1 at Serine 45 initiates phosphorylation of CTNNB1 at other sites including Threonine 41 by GSK3 (PMID 16618120):

p(HGNC:CTNNB1, pmod(Ph, Ser, 45)) => p(HGNC:CTNNB1, pmod(Ph, Thr, 41))

We represent this relationship as direct, because the subject and object terms have the same root abundance node.

Because the kinase mediating the second phosphorylation is known, this relationship can be modeled alternatively as a nested statement:

p(HGNC:CTNNB1, pmod(Ph, Ser, 45)) => \
   (kin(p(SFAM:"GSK3 Family")) => p(HGNC:CTNNB1, pmod(Ph, Thr, 41)))
How do I represent removal of a protein modification (e.g., dephosphorylation, deubiquitination)?

Removal of a specific protein modification is represented simply as a decrease in the abundance of the modified protein.

Deubiquitination

In this example, STAMBP deubiquitinates F2RL1 protein (PMID 19684015):

act(p(HGNC:STAMBP)) =| p(HGNC:F2RL1, pmod(Ub))

Deubiquitination is represented simply as a decrease in the ubiquitinated form of the protein. Because in this example STAMBP is the deubiquitinating enzyme, we used a directlyDecreases relationship.

Dephosphorylation

In this example, the phosphatase CDC25C dephosphorylates CDK1 at tyrosine 15 (PMID 1384126):

act(p(HGNC:CDC25C), ma(phos)) =| p(HGNC:CDK1, pmod(Ph, Tyr, 15))

Similar to deubiquitination, dephosphorylation is represented simply as a decrease in the modified form of the protein.

4.3.4. Reactions

How can I represent a reversible metabolic reaction?

A reversible reaction can be represented by modeling the reaction with the products and reactants interchanged.

For example, HSD11B1 acts primarily to convert cortisone to active cortisol, but in some cell types the reverse reaction is favored (PMID 12530648):

act(p(HGNC:HSD11B1), ma(cat)) => \
 rxn(reactants(a(CHEBI:NADPH), a(CHEBI:cortisone)), products(a(CHEBI:"NADP(+)"), a(CHEBI:cortisol)))
act(p(HGNC:HSD11B1), ma(cat)) => \
 rxn(reactants(a(CHEBI:"NADP(+)"), a(CHEBI:cortisol)), products(a(CHEBI:NADPH), a(CHEBI:cortisone)))

The top statement represents the forward reaction and the bottom statement represents the reverse reaction.

When and why should I use a reaction term?
When_ should I use a reaction term

Reaction terms allow the representation of a transformation of a list of reactants into a list of products. In this example, the superoxide dismutase SOD1 converts superoxide to hydrogen peroxide:

act(p(HGNC:SOD1), ma(cat)) => rxn(reactants(a(CHEBI:superoxide)), products(a(CHEBI:"hydrogen peroxide")))

It is not necessary to include all reactants and products, especially if they are ubiquitous small molecules. In the above example the reactant hydrogen and product oxygen have been omitted from the reaction.

Why_ should I use a reaction term

It is possible to represent the above reaction with separate statements linking the activity of SOD1 to decreased abundances of the reactants and increased abundances of the products:

act(p(HGNC:SOD1), ma(cat)) =| a(CHEBI:superoxide)
act(p(HGNC:SOD1), ma(cat)) => a(CHEBI:"hydrogen peroxide")

While this representation describes the function of the catalytic enzyme SOD1, it does not link the product hydrogen peroxide to the reactant superoxide.

4.3.5. Protein-Protein Interactions

How do I represent a physical interaction between two entities?

You can use a complex abundance to represent binding events between two or more abundance terms. The subject term of your BEL Statement will be the complex. BEL Statements do not require a relationship and object term.

Warning

The order in which the members of the complex are listed is not important.

Examples

EPOR physically interacts with CSF2RB, the common beta-receptor (PMID 15456912)

complex(p(MGI:EPOR), p(MGI:CSF2RB))

KEAP1 binds 15-deoxy-Delta(12,14)-prostaglandin J2 (PMID 15917255)

complex(p(HGNC:KEAP1), a(CHEBI:"15-deoxy-Delta(12,14)-prostaglandin J2"))

KEAP1, CUL3, and RBX1 copurify and are part of a functional E3 ubiquitin ligase complex (PMID 15572695)

complex(p(HGNC:KEAP1), p(HGNC:RBX1), p(HGNC:CUL3))

The AP-1 transcription complex binds the CCL23 promoter (PMID 17368823)

complex(p(GOCC:"AP1 complex"), g(HGNC:CCL23))

4.3.6. Protein Families

When should I use a protein family instead of a specific protein?

Protein families can be used to represent protein abundances in cases where the information presented by the source does not allow identification of the specific protein. For example:

Example 1:

"Akt physically associates with MDM2 and phosphorylates it at Ser166 and Ser186." (PMID 11715018)

act(p(SFAM:"AKT Family"), ma(kin)) => p(HGNC:MDM2, pmod(Ph,Ser,166))
act(p(SFAM:"AKT Family"), ma(kin)) => p(HGNC:MDM2, pmod(Ph,Ser,186))

Here, Akt may refer to AKT1, AKT2, and/or AKT3.

Example 2:

"We show that Siah2 is subject to phosphorylation by p38 MAPK …​ Phosphopeptide mapping identified T24 and S29 as the primary phospho-acceptor sites." (PMID 17003045)

act(p(PFH:"MAPK p38 Family"), ma(kin)) => p(HGNC:SIAH2, pmod(Ph, Thr, 24))
act(p(PFH:"MAPK p38 Family"), ma(kin)) => p(HGNC:SIAH2, pmod(Ph, Ser, 29))

Here, it is not clear which specific p38 MAPK is responsible (MAPK11, MAPK12, MAPK13, or MAPK14).

Example 3:

"Hip encodes a membrane glycoprotein that binds to all three mammalian Hedgehog proteins." (PMID 10050855)

complex(p(SFAM:"Hedgehog Family"), p(MGI:Hhip))
complex(p(MGI:Ihh), p(MGI:Hhip))
complex(p(MGI:Shh), p(MGI:Hhip))
complex(p(MGI:Dhh), p(MGI:Hhip))

In this case, all three hedgehog family members are reported to bind to the hedgehog interacting protein (Hhip). Statements can be modeled using the family as well as each individual member.

5. Implementation

5.1. Formats

5.1.1. BEL Script

5.1.2. XBEL

5.1.3. Evidence JSON

5.1.4. JSON Graph Format (JGF)

6. Tools

6.1. Java

6.2. Ruby

BEL Parameter

The corresponding (indented) definition.

BEL Term

The corresponding (indented) definition.

BEL Statement

The corresponding (indented) definition.