LRez  v2.1
Macros | Typedefs | Enumerations | Functions | Variables
utils.h File Reference
#include <string>
#include <vector>
#include "api/BamReader.h"
#include "api/BamIndex.h"
#include "api/BamAux.h"
#include "robin_hood.h"
#include <sstream>
#include <regex>
#include "barcodesList.h"
Include dependency graph for utils.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Macros

#define BXTAG   "BX:Z"
 
#define no_argument   0
 
#define required_argument   1
 
#define optional_argument   2
 

Typedefs

typedef vector< bool > barcode
 

Enumerations

enum  SequencingTechnology {
  Undefined = 0, TenX, Haplotagging, TELLSeq,
  stLFR
}
 

Functions

SequencingTechnology determineSequencingTechnology (const string &barcode)
 
string retrieveNucleotidesContent (const string &barcode)
 
bool isValidBarcode (const string &barcode)
 
barcode stringToBarcode (const string &str)
 
vector< string > splitString (string s, string delimiter)
 
BamRegion stringToBamRegion (BamReader &reader, string s)
 
vector< string > extractRegions (string chromosome, int32_t chromosomeSize, unsigned regionSize)
 
vector< string > extractRegionsList (BamReader &reader, unsigned regionSize)
 
string convertToSam (const BamAlignment &a, RefVector m_references)
 

Variables

SequencingTechnology techno
 

Macro Definition Documentation

◆ BXTAG

#define BXTAG   "BX:Z"

Definition at line 14 of file utils.h.

◆ no_argument

#define no_argument   0

Definition at line 16 of file utils.h.

◆ optional_argument

#define optional_argument   2

Definition at line 18 of file utils.h.

◆ required_argument

#define required_argument   1

Definition at line 17 of file utils.h.

Typedef Documentation

◆ barcode

typedef vector<bool> barcode

Definition at line 23 of file utils.h.

Enumeration Type Documentation

◆ SequencingTechnology

Supported sequencing technologies

Enumerator
Undefined 
TenX 
Haplotagging 
TELLSeq 
stLFR 

Definition at line 28 of file utils.h.

Function Documentation

◆ convertToSam()

string convertToSam ( const BamAlignment &  a,
RefVector  m_references 
)

Translate a BamAlignment to a SAM-like string.

Parameters
aBamAlignment to translate
m_referencesvector containing the information (name and length) about reference sequences
Returns
a SAM-like string summarizing the information of a

◆ determineSequencingTechnology()

SequencingTechnology determineSequencingTechnology ( const string &  barcode)

Determine the sequencing technology the barcode originates from. This function is compatible with 10x Genomics, Haplotagging, TELL-SEq and stLFR. Barcodes that do not come from these technologies and are not represented as a suite of nucleotides will cause the program to exit.

Parameters
barcodethe barcode to determine sequencing technology from
Exceptions
runtime_errorthrown if a barcode pattern could not be converted to a regexp or if the used sequencing technology was not recognized
Returns
The SequencingTechnology enum field corresponding to the sequencing technology

◆ extractRegions()

vector<string> extractRegions ( string  chromosome,
int32_t  chromosomeSize,
unsigned  regionSize 
)

Extract all regions of a given size from a civen chromosome.

Parameters
chromosmechromosome of interest
chromosomeSizesize of the chromosome
regionSizesize of the regions to extract
Returns
a list of all regions of specifed size of the chromosome

◆ extractRegionsList()

vector<string> extractRegionsList ( BamReader &  reader,
unsigned  regionSize 
)

Extract all regions from all chromosomes.

Parameters
readerBamReader open on the desired BAM file
regionSizesize of the regions to extract
Exceptions
runtime_errorthrown if a contig name could not be converted to an ID or if a region of redaer could not be jumped to
Returns
a list of all regions of specified size of all the chromosomes

◆ isValidBarcode()

bool isValidBarcode ( const string &  barcode)

Check whether a barcode is valid or not. A barcode is considered as valid if it is not empty, if it does not contain any "N" for 10x and TELL-Seq, if it is not "0_0_0" for stLFR data, and does not contain a "00" substring for Haplotagging data. The function takes care of determining the employed sequencing technoly.

Parameters
barcodethe barcode to verify
Exceptions
runtime_errorthrown if the sequencing technology could not be recognized
Returns
true if the barcode is valid, false otherwise

◆ retrieveNucleotidesContent()

string retrieveNucleotidesContent ( const string &  barcode)

Retrieve the nucleotides content of the barcode. This function is used to translate barcodes represented as a suite of integers (as in stLFT and Haplotagging) into nucleotides barcodes. The function takes care of determining the employed sequencing technoly.

Parameters
barcodethe barcode to retrieve nucleotides for
Exceptions
runtime_errorthrown if the sequencing technology could not be recognized
Returns
the barcode in nucleotides representation

◆ splitString()

vector<string> splitString ( string  s,
string  delimiter 
)

Split a string according to a delimiter.

Parameters
sstring to split
delimiterdelimiter
Returns
a vector containing the splits of the string

◆ stringToBamRegion()

BamRegion stringToBamRegion ( BamReader &  reader,
string  s 
)

Translate a string to a BamRegion.

Parameters
readerBamReader open on the desired BAM file
sstring to translate, formatted as chromosome:startPosition-endPosition
Exceptions
runtime_errorthrown if a region could not be converted to a BamRegion or if a contig name could not be converted to an ID
Returns
the BamRegion coressponding to the string s

◆ stringToBarcode()

barcode stringToBarcode ( const string &  str)

Translate a string to a barcode in 2 bits per nucleotide format. The function takes care of determining the employed sequencing technoly, and of retrieving the nucleotides contents of barcodes represented as a suite of integers.

Parameters
strstring to convert
Returns
the barcode in binary representation

Variable Documentation

◆ techno