Skip to content
View omicscode's full-sized avatar

Block or report omicscode

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
omicscode/README.md

Gaurav Sablok, PhD

Research and Academic Roles: as Bioinformatician, Computational Biology and Genomics, NGS Specialist, Research Scientist, Bio-Software Developer, Bioinformatics CTO, and Lead researcher roles as Senior CTO, Senior Computational Lead and Bioinformatician/Associate. I am not interested in freelancing and technician jobs (so please kindly don't contact regarding the same). Genomic analysis and software development for illumina and long read sequencing approaches. I provide bio-software development, algorithms, bioinformatics, machine learning for Illumina and LongRead sequencing technologies across all species. I am specific to the analysis in the areas of RNA-seq and Single cell, Metagenomics and Pangenomics covering all species. Similarly, Biological Machine and Deep learning specific to biological questions, biological language models and bio-software, HPC management. I prefer academic and research areas in University and Organization.

Academic and Research teaching: I only read and write, fullstack RUST for the scientific and academic research. As a part of the scientific and academic research, I can give the language classes to academics and research students. I am not a private teacher or YouTuber and not interested in such freelancing roles or forming collaboration on the same. Read the software research for more detail so that my thinking can be gleaned.

Sequencing specifics:
2010-2021: Plant, Bacterial, Fungi: RNASeq, GenomSeq, Phylogenomics, PacBio Sequencing, Single Cell Analysis, Bio-software.
2021:2023: Machine Learning, Bio-software.
2024: PanGenome, Bio-software.
2025: Human Genomics, Bio-software
Software Development: C++(2010-2021), 2024-: RUST
Web Development: RUST πŸ¦€ i use Axum, Rocket, Dioxus, Actix, Yew, Leptos web framework
Bioinformatics/Machine and Deep Learning: RUST using Keras, Scikit, XgBoost, PyTorch, Tch, Linfa, Candle
HPC Management: RUST.

Background:

I have a background in bioinformatics and plant biology and extensive experience in code development using several programming languages( prior to 2021 - C++, Bash, R - after 2021: Bash, Python, RUST ) covering data analysis and data science, machine and deep learning and web and application development. Following my PhD, I developed bioinformatics methods for transcriptional and post-transcriptional genomics across nuclear and organelle genomes at Fondazione Edmund Mach (Italy). I analyzed and finished multiple RNA-seq and Organelle-Seq experiments for several plant and fungal species, including Arundo donax. Additionally i analyzed multiple metagenomics anlaysis coming from the fungal and bacterial species involving ITS metagenomics, as well as the bacterial metagenomics.Additionally, I have done a lot of work in the field of organelle genomics and have published the first Cardamine species' chloroplast genomes. I independently created an international partnership to find and create computational methods for a number of crop species. Following that, I spent two years (2014–2016) as a Research Fellow (Academic Level B) at the University of Technology, Sydney, Australia, where I developed computational methods for understanding seagrasses. Following that, I spent a short time spent in University of Connecticut, USA, where i analyzed the Douglas fir genome from the genome annotation to the phylogenomics and identifying genes and evolution of importance.

Since August 2017, I have worked as a Postdoctoral researcher at the Finnish Museum of Natural History and the University of Helsinki, conducting research on genome bioinformatics and sequencing the genomes of lower plants, including Coleochaete orbicularis, Blasia pusilla, Chaetospiridium orbicularis, Polytrichum commune, Mallomonas, and Cryptomonas species. My work has been focused on genome assembly, genome annotation, chloroplast genomics, and a variety of other topics. Additionally, I've worked for various other organisations, such as Edinburgh UK, to analyse the genomics data for PAFTOL species and the chloroplast genomes of the Ambrosia clade from Norway. Since 2019, my research has shifted to examining the genomes of fungi whose species have been sequenced using NextSeq methods. This work is currently concentrated on genome assembly, annotations, markers genes, and phylogenomics of those fungi. I have assembled, annotated, and identified ITS and other phylogenomics markers, as well as performed alignments, phylogenies, and downstream analyses on the fungal genomes of over 500 different species. The bioinformatics application of high throughput sequencing and methods to comprehend the biological and functional importance of the genes, evolution, and pathways in plants have been the main areas of my research up to this point.

From 2022-2023, i added several new skills as a carrier advancement. From 2024 onwards, I worked at Universitat Potsdam, Germany, where I self-learnt RUST and develop approaches for machine and deep learning. During the time, I bench-marked PacBioHifi genome analysis and created a complete HMTL, CSS, Javascript enabled web and also coded several approaches, packages in Python and RUST. Since 2025, I worked as Area Expert at Instytut Chemii Bioorganicznej Polskiej Akademii Nauk, Poland, and i worked on human genomics and developed computational approaches for human genomics and software development.

Bioinformatics Software Research: Over the years, I have worked with several languages such as C++ (2010-2021), R (2010-2025), Python (2021-), RUST (2024-) but mostly used one system programming language such as C++ (2010-2021) and then replaced with RUST, which i only use now and forever like C++ previously over 5K crates download (2024-forever). I still use Python and R as a part of the analysis but mostly where it is not having a RUST companion. I believe more in addressing software development using the programming languages i prefer and not to show off multiple languages. Hence i take the languages, which are dominant in their field. I am not interested in any language war or hype. I use what i see as the most dominant and efficient language for solving computational problems.

As a fantatic reader and coder (I wrote and developed every software as single lead bioinformatician at every employment), I sometimes read a little bit about other languages but that doesnt mean, I plan to use them. I do this to enhance my knowledge and bring that features to the language i use. As of now, I only code in RUST as a full stack developer from bioinformatics, software, web and machine and deep learning. I believe a Bioinformatician/Software Developer should be well informed of the research in their areas so that they can actively use them for the place of the employment. I prefer using the modt demanded and ahead of the time language and hence prefer RUST πŸ¦€ . Use of selected languages for everything has enabled me to scale my abilities to everything at every employment and i served as a single lead bioinformatician from software developer to bioinformatics data analysis, machine learning and HPC cloud management.

RUST development: Few of them and click the source for more of them. See the last commit tag as the final build release for each of the source repository. RUST I only plan to use forever for anything in my software developer for bioinformatics, machine and deep learning, HPC and cloud management.

  • eVaiUtilities: Variant analysis from the eVai.
  • panscape: Pangenome long reads
  • sequenceprofiler: Profiling sequence kmers for histograms
  • phyloevolve: Long reads and alignments from the multiple alignments.
  • hpcMapper: DevOPS system managment for the high performance computing.
  • bacdive: Bacterial genome analysis from Bacdive.
  • NLRanalyzer: Complete kit for analyzing NLR.
  • humanCAST: Complete kit for human genome analysis.
  • araseq: Complete kit for the Arabidopsis genome information.
  • minifyseq: Noise removal from the long reads inclduing the machine learning based.
  • CAGanalyzer: Analyzing the CAG repeats from the human genome.
  • doiTAG: Generating doi for the sequences for next generation sequencing.
  • varLinker: Analyzing and linking variants for annotation.
  • vcfFilter: Filtering VCF files
  • rustRet: Analyzing the massspectrometry data.
  • repgnerate: Analyzing the sequencing information post sequencing.
  • proteogenomics: implementing the proteogenomics methods.
  • geomapper: a complete kit for geospatial analysis for German geo.
  • ensemblcov: a complete kit for using the ensemblcov at your commandline.
  • bactiPAN: complete bacterial pangenome analyzer both in shell and RUST.
  • varView: Graphics enabled Variant terminal analyzer in GOlang.
  • doseqGO: A complete sequence information portal for standalone sequence information.
  • varview: Paralel threaded sam viewer and filter.
  • fastscan: Scan high through seqeuncing files.
  • vcfscan: Scan all the variants and filter the variants.

Pinned Loading

  1. eVaiutilities eVaiutilities Public

    Genomics population scale utilities for eVai

    Rust 1 2

  2. panscape panscape Public

    reads to genome, pangenome graphs, summarize and analyze

    Rust 2

  3. bacdive bacdive Public

    bacdive microbial genome informatics

    Rust

  4. sequenceprofiler sequenceprofiler Public

    kmer based sequence profiler for reads and genomes

    Rust

  5. hpcMapper hpcMapper Public

    RUST hpcMapper

    Rust

  6. phyloevolve phyloevolve Public

    phylogenomics for genomics

    Rust