From 5af7885e96cdc84f55a751bea5b3aa6101c1a46c Mon Sep 17 00:00:00 2001 From: jdhenaos Date: Tue, 4 Nov 2025 09:27:57 +0100 Subject: [PATCH 1/6] omicslog first blog --- .../index/execute-results/html.json | 15 ++++ .../2025-11-04-introducing-omicslog/index.qmd | 87 +++++++++++++++++++ 2 files changed, 102 insertions(+) create mode 100755 _freeze/posts/2025-11-04-introducing-omicslog/index/execute-results/html.json create mode 100755 posts/2025-11-04-introducing-omicslog/index.qmd diff --git a/_freeze/posts/2025-11-04-introducing-omicslog/index/execute-results/html.json b/_freeze/posts/2025-11-04-introducing-omicslog/index/execute-results/html.json new file mode 100755 index 0000000..cf7411e --- /dev/null +++ b/_freeze/posts/2025-11-04-introducing-omicslog/index/execute-results/html.json @@ -0,0 +1,15 @@ +{ + "hash": "18009d7c46c21262f594b6a74675e3e4", + "result": { + "engine": "knitr", + "markdown": "---\ntitle: \"Omicslog\"\nauthor: \"Juan Henao\"\ndate: \"2025-10-23\"\npackage: tidyomics\ntags:\n - tidyomics/tidyomicsBlog\n - logging\n - tidyverse\n - bioconductor\ndescription: \"Providing logging capabilities for SummarizedExperiment objects.\"\nformat:\n html:\n toc: true\n toc-float: true\nexecute:\n freeze: true\n---\n\n\n\n# Welcome to omicslog!\n\nI still remember being in front of my PI, trying to recall, or even worse, to guess the number of samples we had ignored when running a specific analysis, such as filtering low-count genes for DEG analysis or excluding biological samples collected outside a target time window when we aimed to discover biomarker candidates for early disease detection. I especially remember how that became even worse across the different projects we were working on simultaneously.\n\nThe solution was always the same: rerun the whole code up to the line that could answer those questions. That was even more frustrating considering that, in many cases, those questions came from pure curiosity rather than information that would be included in the final publication.\n\nInspired by the `lab notebook` from my wet lab colleagues and the `tidylog` package, we present `omicslog`, a package that provides logging capabilities for omics-oriented objects. Our goal is to establish a standard for tracking changes to these objects, acting as an automated dry lab notebook and improving the reproducibility of specific analyses.\n\nWe started by enabling logging for the `SummarizedExperiment` class, powered by `tidyomics` functionalities. Every function in the pipeline is evaluated, with changes traced and aggregated as metadata.\n\nLet’s start with a practical example, beginning with the package installation and library loading:\n\n\n\n::: {.cell messages='false' warnings='false'}\n::: {.cell-output .cell-output-stderr}\n\n```\nSkipping install of 'omicslog' from a github remote, the SHA1 (eeb57643) has not changed since last install.\n Use `force = TRUE` to force installation\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: MatrixGenerics\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: matrixStats\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'MatrixGenerics'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:matrixStats':\n\n colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,\n colCounts, colCummaxs, colCummins, colCumprods, colCumsums,\n colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,\n colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,\n colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,\n colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,\n colWeightedMeans, colWeightedMedians, colWeightedSds,\n colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,\n rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,\n rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,\n rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,\n rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,\n rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,\n rowWeightedMads, rowWeightedMeans, rowWeightedMedians,\n rowWeightedSds, rowWeightedVars\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: GenomicRanges\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: stats4\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: BiocGenerics\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'BiocGenerics'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:stats':\n\n IQR, mad, sd, var, xtabs\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:base':\n\n anyDuplicated, aperm, append, as.data.frame, basename, cbind,\n colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,\n get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,\n match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,\n Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,\n table, tapply, union, unique, unsplit, which.max, which.min\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: S4Vectors\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'S4Vectors'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:base':\n\n expand.grid, I, unname\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: IRanges\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: GenomeInfoDb\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: Biobase\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nWelcome to Bioconductor\n\n Vignettes contain introductory material; view with\n 'browseVignettes()'. To cite Bioconductor, see\n 'citation(\"Biobase\")', and for packages 'citation(\"pkgname\")'.\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'Biobase'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:MatrixGenerics':\n\n rowMedians\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:matrixStats':\n\n anyMissing, rowMedians\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'tidySummarizedExperiment'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:IRanges':\n\n slice\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:S4Vectors':\n\n rename\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:matrixStats':\n\n count\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:stats':\n\n filter\n```\n\n\n:::\n:::\n\n\n\nFor this example, we worked with the `airway` dataset. To extend `tidyomics` with `omicslog`, it is only necessary to add the `log_start()` function before applying the different filtering criteria:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(airway, package=\"airway\")\n\nresult <- \n airway |>\n log_start() |> # Starting the logging operations\n filter(dex == \"untrt\") |>\n select(!albut) |>\n mutate(dex_upper = toupper(dex)) |>\n extract(col = dex,into = \"treat\") |>\n mutate(Run = tolower(Run)) |> \n filter(.feature == \"ENSG00000000003\") |>\n slice(3)\n\nresult\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A SummarizedExperiment-tibble abstraction: 1 × 1\n# \u001b[90mFeatures=1 | Samples=1 | Assays=counts\u001b[0m\n .feature .sample counts SampleName cell treat Run avgLength Experiment\n \n1 ENSG00000000… SRR103… 1138 GSM1275870 N080… untrt srr1… 120 SRX384353 \n# ℹ 3 more variables: Sample , BioSample , dex_upper \n\nOperation log:\n[2025-11-04 08:23:29] filter: removed 4 samples (50%), 4 samples remaining\n[2025-11-04 08:23:29] select: removed 1 (11%), 8 column(s) remaining\n[2025-11-04 08:23:29] mutate: added 1 new column(s): dex_upper\n[2025-11-04 08:23:29] extract: extracted 'dex' into column: treat (original removed)\n[2025-11-04 08:23:30] mutate: modified column(s): Run\n[2025-11-04 08:23:30] filter: removed 64101 genes (100%), 1 genes remaining\n[2025-11-04 08:23:30] slice: Kept 1/4 rows (25.0%); removed 3 rows \n```\n\n\n:::\n:::\n\n\n\nAs a result, the `metadata` shows a short description of the different modifications the `SummarizedExperiment` underwent during each function in the pipeline, formatted as `[TIME] FUNCTION NAME: ONE-LINE DESCRIPTION`.\n\nNotwithstanding, `omicslog` can also work with base R commands by simply adding the dataset name to the `log_start()` function:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(restore_SummarizedExperiment_show = TRUE)\n\nresult_base <- log_start(airway) # Starting the logging operations\n\nresult_base <- result_base[, colData(result_base)$dex == \"untrt\"]\ncolData(result_base)$dex_upper <- toupper(colData(result_base)$dex)\ncolData(result_base)$Run <- tolower(colData(result_base)$Run)\nresult_base <- result_base[rownames(result_base) == \"ENSG00000000003\", ]\n\nresult_base\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nclass: SummarizedExperimentLogged \ndim: 1 4 \nmetadata(1): ''\nassays(1): counts\nrownames(1): ENSG00000000003\nrowData names(0):\ncolnames(4): SRR1039508 SRR1039512 SRR1039516 SRR1039520\ncolData names(10): SampleName cell ... BioSample dex_upper\n\nOperation log:\n[2025-11-04 08:23:31] subset: removed 4 samples (50%), 4 samples remaining\n[2025-11-04 08:23:31] colData<-: added 1 new column(s): dex_upper\n[2025-11-04 08:23:31] colData<-: modified column 'Run'\n[2025-11-04 08:23:31] subset: removed 64101 genes (100%), 1 genes remaining \n```\n\n\n:::\n:::\n\n\n\n# We need your feedback!\n\n**Tell us your stories:** \n\n* What is your experience working with omics-oriented objects? \n* What difficulties have you faced when tracing changes across different experiments? \n* What else can we do to make your research more comfortable and easier to track?\n\nDon’t hesitate to open an issue in the [omicslog](https://github.com/tidyomics/omicslog \"logging capabilities for SummarizedExperiment objects\") GitHub repo.", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/posts/2025-11-04-introducing-omicslog/index.qmd b/posts/2025-11-04-introducing-omicslog/index.qmd new file mode 100755 index 0000000..0d1f330 --- /dev/null +++ b/posts/2025-11-04-introducing-omicslog/index.qmd @@ -0,0 +1,87 @@ +--- +title: "Omicslog" +author: "Juan Henao" +date: "2025-10-23" +package: tidyomics +tags: + - tidyomics/tidyomicsBlog + - logging + - tidyverse + - bioconductor +description: "Providing logging capabilities for SummarizedExperiment objects." +format: + html: + toc: true + toc-float: true +execute: + freeze: true +--- + +# Welcome to omicslog! + +I still remember being in front of my PI, trying to recall, or even worse, to guess the number of samples we had ignored when running a specific analysis, such as filtering low-count genes for DEG analysis or excluding biological samples collected outside a target time window when we aimed to discover biomarker candidates for early disease detection. I especially remember how that became even worse across the different projects we were working on simultaneously. + +The solution was always the same: rerun the whole code up to the line that could answer those questions. That was even more frustrating considering that, in many cases, those questions came from pure curiosity rather than information that would be included in the final publication. + +Inspired by the `lab notebook` from my wet lab colleagues and the `tidylog` package, we present `omicslog`, a package that provides logging capabilities for omics-oriented objects. Our goal is to establish a standard for tracking changes to these objects, acting as an automated dry lab notebook and improving the reproducibility of specific analyses. + +We started by enabling logging for the `SummarizedExperiment` class, powered by `tidyomics` functionalities. Every function in the pipeline is evaluated, with changes traced and aggregated as metadata. + +Let’s start with a practical example, beginning with the package installation and library loading: + +```{r, echo=FALSE, messages=FALSE, warnings=FALSE} +if (!require("devtools", quietly = TRUE)) + install.packages("devtools") + +devtools::install_github("tidyomics/omicslog") + +library(SummarizedExperiment) +library(tidySummarizedExperiment) +library(omicslog) +``` + +For this example, we worked with the `airway` dataset. To extend `tidyomics` with `omicslog`, it is only necessary to add the `log_start()` function before applying the different filtering criteria: + +```{r} +data(airway, package="airway") + +result <- + airway |> + log_start() |> # Starting the logging operations + filter(dex == "untrt") |> + select(!albut) |> + mutate(dex_upper = toupper(dex)) |> + extract(col = dex,into = "treat") |> + mutate(Run = tolower(Run)) |> + filter(.feature == "ENSG00000000003") |> + slice(3) + +result +``` + +As a result, the `metadata` shows a short description of the different modifications the `SummarizedExperiment` underwent during each function in the pipeline, formatted as `[TIME] FUNCTION NAME: ONE-LINE DESCRIPTION`. + +Notwithstanding, `omicslog` can also work with base R commands by simply adding the dataset name to the `log_start()` function: + +```{r} +options(restore_SummarizedExperiment_show = TRUE) + +result_base <- log_start(airway) # Starting the logging operations + +result_base <- result_base[, colData(result_base)$dex == "untrt"] +colData(result_base)$dex_upper <- toupper(colData(result_base)$dex) +colData(result_base)$Run <- tolower(colData(result_base)$Run) +result_base <- result_base[rownames(result_base) == "ENSG00000000003", ] + +result_base +``` + +# We need your feedback! + +**Tell us your stories:** + +* What is your experience working with omics-oriented objects? +* What difficulties have you faced when tracing changes across different experiments? +* What else can we do to make your research more comfortable and easier to track? + +Don’t hesitate to open an issue in the [omicslog](https://github.com/tidyomics/omicslog "logging capabilities for SummarizedExperiment objects") GitHub repo. \ No newline at end of file From 3ffa9427d57a03830992cc83d208fb78e071f1b4 Mon Sep 17 00:00:00 2001 From: jdhenaos Date: Tue, 4 Nov 2025 09:35:45 +0100 Subject: [PATCH 2/6] omicslog first blog --- .../index/execute-results/html.json | 4 ++-- posts/2025-11-04-introducing-omicslog/index.qmd | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/_freeze/posts/2025-11-04-introducing-omicslog/index/execute-results/html.json b/_freeze/posts/2025-11-04-introducing-omicslog/index/execute-results/html.json index cf7411e..cad4082 100755 --- a/_freeze/posts/2025-11-04-introducing-omicslog/index/execute-results/html.json +++ b/_freeze/posts/2025-11-04-introducing-omicslog/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "18009d7c46c21262f594b6a74675e3e4", + "hash": "a401b576ed0644f65c6f9accb3cb05e0", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Omicslog\"\nauthor: \"Juan Henao\"\ndate: \"2025-10-23\"\npackage: tidyomics\ntags:\n - tidyomics/tidyomicsBlog\n - logging\n - tidyverse\n - bioconductor\ndescription: \"Providing logging capabilities for SummarizedExperiment objects.\"\nformat:\n html:\n toc: true\n toc-float: true\nexecute:\n freeze: true\n---\n\n\n\n# Welcome to omicslog!\n\nI still remember being in front of my PI, trying to recall, or even worse, to guess the number of samples we had ignored when running a specific analysis, such as filtering low-count genes for DEG analysis or excluding biological samples collected outside a target time window when we aimed to discover biomarker candidates for early disease detection. I especially remember how that became even worse across the different projects we were working on simultaneously.\n\nThe solution was always the same: rerun the whole code up to the line that could answer those questions. That was even more frustrating considering that, in many cases, those questions came from pure curiosity rather than information that would be included in the final publication.\n\nInspired by the `lab notebook` from my wet lab colleagues and the `tidylog` package, we present `omicslog`, a package that provides logging capabilities for omics-oriented objects. Our goal is to establish a standard for tracking changes to these objects, acting as an automated dry lab notebook and improving the reproducibility of specific analyses.\n\nWe started by enabling logging for the `SummarizedExperiment` class, powered by `tidyomics` functionalities. Every function in the pipeline is evaluated, with changes traced and aggregated as metadata.\n\nLet’s start with a practical example, beginning with the package installation and library loading:\n\n\n\n::: {.cell messages='false' warnings='false'}\n::: {.cell-output .cell-output-stderr}\n\n```\nSkipping install of 'omicslog' from a github remote, the SHA1 (eeb57643) has not changed since last install.\n Use `force = TRUE` to force installation\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: MatrixGenerics\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: matrixStats\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'MatrixGenerics'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:matrixStats':\n\n colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,\n colCounts, colCummaxs, colCummins, colCumprods, colCumsums,\n colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,\n colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,\n colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,\n colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,\n colWeightedMeans, colWeightedMedians, colWeightedSds,\n colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,\n rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,\n rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,\n rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,\n rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,\n rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,\n rowWeightedMads, rowWeightedMeans, rowWeightedMedians,\n rowWeightedSds, rowWeightedVars\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: GenomicRanges\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: stats4\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: BiocGenerics\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'BiocGenerics'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:stats':\n\n IQR, mad, sd, var, xtabs\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:base':\n\n anyDuplicated, aperm, append, as.data.frame, basename, cbind,\n colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,\n get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,\n match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,\n Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,\n table, tapply, union, unique, unsplit, which.max, which.min\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: S4Vectors\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'S4Vectors'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:base':\n\n expand.grid, I, unname\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: IRanges\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: GenomeInfoDb\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: Biobase\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nWelcome to Bioconductor\n\n Vignettes contain introductory material; view with\n 'browseVignettes()'. To cite Bioconductor, see\n 'citation(\"Biobase\")', and for packages 'citation(\"pkgname\")'.\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'Biobase'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:MatrixGenerics':\n\n rowMedians\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:matrixStats':\n\n anyMissing, rowMedians\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'tidySummarizedExperiment'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:IRanges':\n\n slice\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:S4Vectors':\n\n rename\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:matrixStats':\n\n count\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:stats':\n\n filter\n```\n\n\n:::\n:::\n\n\n\nFor this example, we worked with the `airway` dataset. To extend `tidyomics` with `omicslog`, it is only necessary to add the `log_start()` function before applying the different filtering criteria:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(airway, package=\"airway\")\n\nresult <- \n airway |>\n log_start() |> # Starting the logging operations\n filter(dex == \"untrt\") |>\n select(!albut) |>\n mutate(dex_upper = toupper(dex)) |>\n extract(col = dex,into = \"treat\") |>\n mutate(Run = tolower(Run)) |> \n filter(.feature == \"ENSG00000000003\") |>\n slice(3)\n\nresult\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A SummarizedExperiment-tibble abstraction: 1 × 1\n# \u001b[90mFeatures=1 | Samples=1 | Assays=counts\u001b[0m\n .feature .sample counts SampleName cell treat Run avgLength Experiment\n \n1 ENSG00000000… SRR103… 1138 GSM1275870 N080… untrt srr1… 120 SRX384353 \n# ℹ 3 more variables: Sample , BioSample , dex_upper \n\nOperation log:\n[2025-11-04 08:23:29] filter: removed 4 samples (50%), 4 samples remaining\n[2025-11-04 08:23:29] select: removed 1 (11%), 8 column(s) remaining\n[2025-11-04 08:23:29] mutate: added 1 new column(s): dex_upper\n[2025-11-04 08:23:29] extract: extracted 'dex' into column: treat (original removed)\n[2025-11-04 08:23:30] mutate: modified column(s): Run\n[2025-11-04 08:23:30] filter: removed 64101 genes (100%), 1 genes remaining\n[2025-11-04 08:23:30] slice: Kept 1/4 rows (25.0%); removed 3 rows \n```\n\n\n:::\n:::\n\n\n\nAs a result, the `metadata` shows a short description of the different modifications the `SummarizedExperiment` underwent during each function in the pipeline, formatted as `[TIME] FUNCTION NAME: ONE-LINE DESCRIPTION`.\n\nNotwithstanding, `omicslog` can also work with base R commands by simply adding the dataset name to the `log_start()` function:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(restore_SummarizedExperiment_show = TRUE)\n\nresult_base <- log_start(airway) # Starting the logging operations\n\nresult_base <- result_base[, colData(result_base)$dex == \"untrt\"]\ncolData(result_base)$dex_upper <- toupper(colData(result_base)$dex)\ncolData(result_base)$Run <- tolower(colData(result_base)$Run)\nresult_base <- result_base[rownames(result_base) == \"ENSG00000000003\", ]\n\nresult_base\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nclass: SummarizedExperimentLogged \ndim: 1 4 \nmetadata(1): ''\nassays(1): counts\nrownames(1): ENSG00000000003\nrowData names(0):\ncolnames(4): SRR1039508 SRR1039512 SRR1039516 SRR1039520\ncolData names(10): SampleName cell ... BioSample dex_upper\n\nOperation log:\n[2025-11-04 08:23:31] subset: removed 4 samples (50%), 4 samples remaining\n[2025-11-04 08:23:31] colData<-: added 1 new column(s): dex_upper\n[2025-11-04 08:23:31] colData<-: modified column 'Run'\n[2025-11-04 08:23:31] subset: removed 64101 genes (100%), 1 genes remaining \n```\n\n\n:::\n:::\n\n\n\n# We need your feedback!\n\n**Tell us your stories:** \n\n* What is your experience working with omics-oriented objects? \n* What difficulties have you faced when tracing changes across different experiments? \n* What else can we do to make your research more comfortable and easier to track?\n\nDon’t hesitate to open an issue in the [omicslog](https://github.com/tidyomics/omicslog \"logging capabilities for SummarizedExperiment objects\") GitHub repo.", + "markdown": "---\ntitle: \"Omicslog\"\nauthor: \"Juan Henao\"\ndate: \"2025-11-04\"\npackage: tidyomics\ntags:\n - tidyomics/tidyomicsBlog\n - logging\n - tidyverse\n - bioconductor\ndescription: \"Providing logging capabilities for SummarizedExperiment objects.\"\nformat:\n html:\n toc: true\n toc-float: true\nexecute:\n freeze: true\n---\n\n\n\n# Welcome to omicslog!\n\nI still remember being in front of my PI, trying to recall, or even worse, to guess the number of samples we had ignored when running a specific analysis, such as filtering low-count genes for DEG analysis or excluding biological samples collected outside a target time window when we aimed to discover biomarker candidates for early disease detection. I especially remember how that became even worse across the different projects we were working on simultaneously.\n\nThe solution was always the same: rerun the whole code up to the line that could answer those questions. That was even more frustrating considering that, in many cases, those questions came from pure curiosity rather than information that would be included in the final publication.\n\nInspired by the `lab notebook` from my wet lab colleagues and the `tidylog` package, we present `omicslog`, a package that provides logging capabilities for omics-oriented objects. Our goal is to establish a standard for tracking changes to these objects, acting as an automated dry lab notebook and improving the reproducibility of specific analyses.\n\nWe started by enabling logging for the `SummarizedExperiment` class, powered by `tidyomics` functionalities. Every function in the pipeline is evaluated, with changes traced and aggregated as metadata.\n\nLet’s start with a practical example, beginning with the package installation and library loading:\n\n\n\n::: {.cell messages='false' warnings='false'}\n::: {.cell-output .cell-output-stderr}\n\n```\nSkipping install of 'omicslog' from a github remote, the SHA1 (eeb57643) has not changed since last install.\n Use `force = TRUE` to force installation\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: MatrixGenerics\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: matrixStats\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'MatrixGenerics'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:matrixStats':\n\n colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,\n colCounts, colCummaxs, colCummins, colCumprods, colCumsums,\n colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,\n colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,\n colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,\n colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,\n colWeightedMeans, colWeightedMedians, colWeightedSds,\n colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,\n rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,\n rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,\n rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,\n rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,\n rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,\n rowWeightedMads, rowWeightedMeans, rowWeightedMedians,\n rowWeightedSds, rowWeightedVars\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: GenomicRanges\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: stats4\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: BiocGenerics\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'BiocGenerics'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:stats':\n\n IQR, mad, sd, var, xtabs\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:base':\n\n anyDuplicated, aperm, append, as.data.frame, basename, cbind,\n colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,\n get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,\n match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,\n Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,\n table, tapply, union, unique, unsplit, which.max, which.min\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: S4Vectors\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'S4Vectors'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:base':\n\n expand.grid, I, unname\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: IRanges\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: GenomeInfoDb\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: Biobase\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nWelcome to Bioconductor\n\n Vignettes contain introductory material; view with\n 'browseVignettes()'. To cite Bioconductor, see\n 'citation(\"Biobase\")', and for packages 'citation(\"pkgname\")'.\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'Biobase'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:MatrixGenerics':\n\n rowMedians\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:matrixStats':\n\n anyMissing, rowMedians\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'tidySummarizedExperiment'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:IRanges':\n\n slice\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:S4Vectors':\n\n rename\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:matrixStats':\n\n count\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:stats':\n\n filter\n```\n\n\n:::\n:::\n\n\n\nFor this example, we worked with the `airway` dataset. To extend `tidyomics` with `omicslog`, it is only necessary to add the `log_start()` function before applying the different filtering criteria:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(airway, package=\"airway\")\n\nresult <- \n airway |>\n log_start() |> # Starting the logging operations\n filter(dex == \"untrt\") |>\n select(!albut) |>\n mutate(dex_upper = toupper(dex)) |>\n extract(col = dex,into = \"treat\") |>\n mutate(Run = tolower(Run)) |> \n filter(.feature == \"ENSG00000000003\") |>\n slice(3)\n\nresult\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A SummarizedExperiment-tibble abstraction: 1 × 1\n# \u001b[90mFeatures=1 | Samples=1 | Assays=counts\u001b[0m\n .feature .sample counts SampleName cell treat Run avgLength Experiment\n \n1 ENSG00000000… SRR103… 1138 GSM1275870 N080… untrt srr1… 120 SRX384353 \n# ℹ 3 more variables: Sample , BioSample , dex_upper \n\nOperation log:\n[2025-11-04 08:35:11] filter: removed 4 samples (50%), 4 samples remaining\n[2025-11-04 08:35:11] select: removed 1 (11%), 8 column(s) remaining\n[2025-11-04 08:35:12] mutate: added 1 new column(s): dex_upper\n[2025-11-04 08:35:12] extract: extracted 'dex' into column: treat (original removed)\n[2025-11-04 08:35:12] mutate: modified column(s): Run\n[2025-11-04 08:35:13] filter: removed 64101 genes (100%), 1 genes remaining\n[2025-11-04 08:35:13] slice: Kept 1/4 rows (25.0%); removed 3 rows \n```\n\n\n:::\n:::\n\n\n\nAs a result, the `metadata` shows a short description of the different modifications the `SummarizedExperiment` underwent during each function in the pipeline, formatted as `[TIME] FUNCTION NAME: ONE-LINE DESCRIPTION`.\n\nNotwithstanding, `omicslog` can also work with base R commands by simply adding the dataset name to the `log_start()` function:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(restore_SummarizedExperiment_show = TRUE)\n\nresult_base <- log_start(airway) # Starting the logging operations\n\nresult_base <- result_base[, colData(result_base)$dex == \"untrt\"]\ncolData(result_base)$dex_upper <- toupper(colData(result_base)$dex)\ncolData(result_base)$Run <- tolower(colData(result_base)$Run)\nresult_base <- result_base[rownames(result_base) == \"ENSG00000000003\", ]\n\nresult_base\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nclass: SummarizedExperimentLogged \ndim: 1 4 \nmetadata(1): ''\nassays(1): counts\nrownames(1): ENSG00000000003\nrowData names(0):\ncolnames(4): SRR1039508 SRR1039512 SRR1039516 SRR1039520\ncolData names(10): SampleName cell ... BioSample dex_upper\n\nOperation log:\n[2025-11-04 08:35:14] subset: removed 4 samples (50%), 4 samples remaining\n[2025-11-04 08:35:14] colData<-: added 1 new column(s): dex_upper\n[2025-11-04 08:35:14] colData<-: modified column 'Run'\n[2025-11-04 08:35:14] subset: removed 64101 genes (100%), 1 genes remaining \n```\n\n\n:::\n:::\n\n\n\n# We need your feedback!\n\n**Tell us your stories:** \n\n* What is your experience working with omics-oriented objects? \n* What difficulties have you faced when tracing changes across different experiments? \n* What else can we do to make your research more comfortable and easier to track?\n\nDon’t hesitate to open an issue in the [omicslog](https://github.com/tidyomics/omicslog \"logging capabilities for SummarizedExperiment objects\") GitHub repo.", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/posts/2025-11-04-introducing-omicslog/index.qmd b/posts/2025-11-04-introducing-omicslog/index.qmd index 0d1f330..4be1001 100755 --- a/posts/2025-11-04-introducing-omicslog/index.qmd +++ b/posts/2025-11-04-introducing-omicslog/index.qmd @@ -1,7 +1,7 @@ --- title: "Omicslog" author: "Juan Henao" -date: "2025-10-23" +date: "2025-11-04" package: tidyomics tags: - tidyomics/tidyomicsBlog From 63d95f1bf5edff55397077a76f81ad0f1d42fe78 Mon Sep 17 00:00:00 2001 From: jdhenaos Date: Tue, 4 Nov 2025 09:50:01 +0100 Subject: [PATCH 3/6] omicslog first blog --- .../index/execute-results/html.json | 4 +- .../2025-11-04-introducing-omicslog/index.qmd | 41 +++++++++++++++++++ 2 files changed, 43 insertions(+), 2 deletions(-) diff --git a/_freeze/posts/2025-11-04-introducing-omicslog/index/execute-results/html.json b/_freeze/posts/2025-11-04-introducing-omicslog/index/execute-results/html.json index cad4082..a3d6dcc 100755 --- a/_freeze/posts/2025-11-04-introducing-omicslog/index/execute-results/html.json +++ b/_freeze/posts/2025-11-04-introducing-omicslog/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "a401b576ed0644f65c6f9accb3cb05e0", + "hash": "ffe1ebd855c82caa38590a1c1538e180", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Omicslog\"\nauthor: \"Juan Henao\"\ndate: \"2025-11-04\"\npackage: tidyomics\ntags:\n - tidyomics/tidyomicsBlog\n - logging\n - tidyverse\n - bioconductor\ndescription: \"Providing logging capabilities for SummarizedExperiment objects.\"\nformat:\n html:\n toc: true\n toc-float: true\nexecute:\n freeze: true\n---\n\n\n\n# Welcome to omicslog!\n\nI still remember being in front of my PI, trying to recall, or even worse, to guess the number of samples we had ignored when running a specific analysis, such as filtering low-count genes for DEG analysis or excluding biological samples collected outside a target time window when we aimed to discover biomarker candidates for early disease detection. I especially remember how that became even worse across the different projects we were working on simultaneously.\n\nThe solution was always the same: rerun the whole code up to the line that could answer those questions. That was even more frustrating considering that, in many cases, those questions came from pure curiosity rather than information that would be included in the final publication.\n\nInspired by the `lab notebook` from my wet lab colleagues and the `tidylog` package, we present `omicslog`, a package that provides logging capabilities for omics-oriented objects. Our goal is to establish a standard for tracking changes to these objects, acting as an automated dry lab notebook and improving the reproducibility of specific analyses.\n\nWe started by enabling logging for the `SummarizedExperiment` class, powered by `tidyomics` functionalities. Every function in the pipeline is evaluated, with changes traced and aggregated as metadata.\n\nLet’s start with a practical example, beginning with the package installation and library loading:\n\n\n\n::: {.cell messages='false' warnings='false'}\n::: {.cell-output .cell-output-stderr}\n\n```\nSkipping install of 'omicslog' from a github remote, the SHA1 (eeb57643) has not changed since last install.\n Use `force = TRUE` to force installation\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: MatrixGenerics\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: matrixStats\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'MatrixGenerics'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:matrixStats':\n\n colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,\n colCounts, colCummaxs, colCummins, colCumprods, colCumsums,\n colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,\n colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,\n colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,\n colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,\n colWeightedMeans, colWeightedMedians, colWeightedSds,\n colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,\n rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,\n rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,\n rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,\n rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,\n rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,\n rowWeightedMads, rowWeightedMeans, rowWeightedMedians,\n rowWeightedSds, rowWeightedVars\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: GenomicRanges\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: stats4\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: BiocGenerics\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'BiocGenerics'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:stats':\n\n IQR, mad, sd, var, xtabs\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:base':\n\n anyDuplicated, aperm, append, as.data.frame, basename, cbind,\n colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,\n get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,\n match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,\n Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,\n table, tapply, union, unique, unsplit, which.max, which.min\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: S4Vectors\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'S4Vectors'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:base':\n\n expand.grid, I, unname\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: IRanges\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: GenomeInfoDb\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: Biobase\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nWelcome to Bioconductor\n\n Vignettes contain introductory material; view with\n 'browseVignettes()'. To cite Bioconductor, see\n 'citation(\"Biobase\")', and for packages 'citation(\"pkgname\")'.\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'Biobase'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:MatrixGenerics':\n\n rowMedians\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:matrixStats':\n\n anyMissing, rowMedians\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'tidySummarizedExperiment'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:IRanges':\n\n slice\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:S4Vectors':\n\n rename\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:matrixStats':\n\n count\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:stats':\n\n filter\n```\n\n\n:::\n:::\n\n\n\nFor this example, we worked with the `airway` dataset. To extend `tidyomics` with `omicslog`, it is only necessary to add the `log_start()` function before applying the different filtering criteria:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(airway, package=\"airway\")\n\nresult <- \n airway |>\n log_start() |> # Starting the logging operations\n filter(dex == \"untrt\") |>\n select(!albut) |>\n mutate(dex_upper = toupper(dex)) |>\n extract(col = dex,into = \"treat\") |>\n mutate(Run = tolower(Run)) |> \n filter(.feature == \"ENSG00000000003\") |>\n slice(3)\n\nresult\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A SummarizedExperiment-tibble abstraction: 1 × 1\n# \u001b[90mFeatures=1 | Samples=1 | Assays=counts\u001b[0m\n .feature .sample counts SampleName cell treat Run avgLength Experiment\n \n1 ENSG00000000… SRR103… 1138 GSM1275870 N080… untrt srr1… 120 SRX384353 \n# ℹ 3 more variables: Sample , BioSample , dex_upper \n\nOperation log:\n[2025-11-04 08:35:11] filter: removed 4 samples (50%), 4 samples remaining\n[2025-11-04 08:35:11] select: removed 1 (11%), 8 column(s) remaining\n[2025-11-04 08:35:12] mutate: added 1 new column(s): dex_upper\n[2025-11-04 08:35:12] extract: extracted 'dex' into column: treat (original removed)\n[2025-11-04 08:35:12] mutate: modified column(s): Run\n[2025-11-04 08:35:13] filter: removed 64101 genes (100%), 1 genes remaining\n[2025-11-04 08:35:13] slice: Kept 1/4 rows (25.0%); removed 3 rows \n```\n\n\n:::\n:::\n\n\n\nAs a result, the `metadata` shows a short description of the different modifications the `SummarizedExperiment` underwent during each function in the pipeline, formatted as `[TIME] FUNCTION NAME: ONE-LINE DESCRIPTION`.\n\nNotwithstanding, `omicslog` can also work with base R commands by simply adding the dataset name to the `log_start()` function:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(restore_SummarizedExperiment_show = TRUE)\n\nresult_base <- log_start(airway) # Starting the logging operations\n\nresult_base <- result_base[, colData(result_base)$dex == \"untrt\"]\ncolData(result_base)$dex_upper <- toupper(colData(result_base)$dex)\ncolData(result_base)$Run <- tolower(colData(result_base)$Run)\nresult_base <- result_base[rownames(result_base) == \"ENSG00000000003\", ]\n\nresult_base\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nclass: SummarizedExperimentLogged \ndim: 1 4 \nmetadata(1): ''\nassays(1): counts\nrownames(1): ENSG00000000003\nrowData names(0):\ncolnames(4): SRR1039508 SRR1039512 SRR1039516 SRR1039520\ncolData names(10): SampleName cell ... BioSample dex_upper\n\nOperation log:\n[2025-11-04 08:35:14] subset: removed 4 samples (50%), 4 samples remaining\n[2025-11-04 08:35:14] colData<-: added 1 new column(s): dex_upper\n[2025-11-04 08:35:14] colData<-: modified column 'Run'\n[2025-11-04 08:35:14] subset: removed 64101 genes (100%), 1 genes remaining \n```\n\n\n:::\n:::\n\n\n\n# We need your feedback!\n\n**Tell us your stories:** \n\n* What is your experience working with omics-oriented objects? \n* What difficulties have you faced when tracing changes across different experiments? \n* What else can we do to make your research more comfortable and easier to track?\n\nDon’t hesitate to open an issue in the [omicslog](https://github.com/tidyomics/omicslog \"logging capabilities for SummarizedExperiment objects\") GitHub repo.", + "markdown": "---\ntitle: \"Omicslog\"\nauthor: \"Juan Henao\"\ndate: \"2025-11-04\"\npackage: tidyomics\ntags:\n - tidyomics/tidyomicsBlog\n - logging\n - tidyverse\n - bioconductor\ndescription: \"Providing logging capabilities for SummarizedExperiment objects.\"\nformat:\n html:\n toc: true\n toc-float: true\nexecute:\n freeze: true\n---\n\n\n\n# Welcome to omicslog!\n\nI still remember being in front of my PI, trying to recall, or even worse, to guess the number of samples we had ignored when running a specific analysis, such as filtering low-count genes for DEG analysis or excluding biological samples collected outside a target time window when we aimed to discover biomarker candidates for early disease detection. I especially remember how that became even worse across the different projects we were working on simultaneously.\n\nThe solution was always the same: rerun the whole code up to the line that could answer those questions. That was even more frustrating considering that, in many cases, those questions came from pure curiosity rather than information that would be included in the final publication.\n\nInspired by the `lab notebook` from my wet lab colleagues and the `tidylog` package, we present `omicslog`, a package that provides logging capabilities for omics-oriented objects. Our goal is to establish a standard for tracking changes to these objects, acting as an automated dry lab notebook and improving the reproducibility of specific analyses.\n\nWe started by enabling logging for the `SummarizedExperiment` class, powered by `tidyomics` functionalities. Every function in the pipeline is evaluated, with changes traced and aggregated as metadata.\n\nLet’s start with a practical example, beginning with the package installation and library loading:\n\n\n\n::: {.cell messages='false' warnings='false'}\n::: {.cell-output .cell-output-stderr}\n\n```\nSkipping install of 'omicslog' from a github remote, the SHA1 (eeb57643) has not changed since last install.\n Use `force = TRUE` to force installation\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: MatrixGenerics\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: matrixStats\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'MatrixGenerics'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:matrixStats':\n\n colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,\n colCounts, colCummaxs, colCummins, colCumprods, colCumsums,\n colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,\n colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,\n colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,\n colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,\n colWeightedMeans, colWeightedMedians, colWeightedSds,\n colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,\n rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,\n rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,\n rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,\n rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,\n rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,\n rowWeightedMads, rowWeightedMeans, rowWeightedMedians,\n rowWeightedSds, rowWeightedVars\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: GenomicRanges\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: stats4\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: BiocGenerics\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'BiocGenerics'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:stats':\n\n IQR, mad, sd, var, xtabs\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:base':\n\n anyDuplicated, aperm, append, as.data.frame, basename, cbind,\n colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,\n get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,\n match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,\n Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,\n table, tapply, union, unique, unsplit, which.max, which.min\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: S4Vectors\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'S4Vectors'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:base':\n\n expand.grid, I, unname\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: IRanges\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: GenomeInfoDb\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: Biobase\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nWelcome to Bioconductor\n\n Vignettes contain introductory material; view with\n 'browseVignettes()'. To cite Bioconductor, see\n 'citation(\"Biobase\")', and for packages 'citation(\"pkgname\")'.\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'Biobase'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:MatrixGenerics':\n\n rowMedians\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:matrixStats':\n\n anyMissing, rowMedians\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'tidySummarizedExperiment'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:IRanges':\n\n slice\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:S4Vectors':\n\n rename\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:matrixStats':\n\n count\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:stats':\n\n filter\n```\n\n\n:::\n:::\n\n\n\nFor this example, we worked with the `airway` dataset. To extend `tidyomics` with `omicslog`, it is only necessary to add the `log_start()` function before applying the different filtering criteria:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(airway, package=\"airway\")\n\nresult <- \n airway |>\n log_start() |> # Starting the logging operations\n filter(dex == \"untrt\") |>\n select(!albut) |>\n mutate(dex_upper = toupper(dex)) |>\n extract(col = dex,into = \"treat\") |>\n mutate(Run = tolower(Run)) |> \n filter(.feature == \"ENSG00000000003\") |>\n slice(3)\n\nresult\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A SummarizedExperiment-tibble abstraction: 1 × 1\n# \u001b[90mFeatures=1 | Samples=1 | Assays=counts\u001b[0m\n .feature .sample counts SampleName cell treat Run avgLength Experiment\n \n1 ENSG00000000… SRR103… 1138 GSM1275870 N080… untrt srr1… 120 SRX384353 \n# ℹ 3 more variables: Sample , BioSample , dex_upper \n\nOperation log:\n[2025-11-04 08:49:06] filter: removed 4 samples (50%), 4 samples remaining\n[2025-11-04 08:49:06] select: removed 1 (11%), 8 column(s) remaining\n[2025-11-04 08:49:07] mutate: added 1 new column(s): dex_upper\n[2025-11-04 08:49:07] extract: extracted 'dex' into column: treat (original removed)\n[2025-11-04 08:49:08] mutate: modified column(s): Run\n[2025-11-04 08:49:08] filter: removed 64101 genes (100%), 1 genes remaining\n[2025-11-04 08:49:08] slice: Kept 1/4 rows (25.0%); removed 3 rows \n```\n\n\n:::\n:::\n\n\n\n:::{.smaller}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#> # A SummarizedExperiment-tibble abstraction: 1 × 22\n#> # Features=1 | Samples=1 | Assays=counts\n#> .feature .sample counts SampleName cell treat Run avgLength \n#> \n#> 1 ENSG0000000… SRR103… 1138 GSM1275870 N080… untrt srr1… 120 \n#> # ℹ 9 more variables: gene_name , entrezid , gene_biotype , \n#> # gene_seq_start , gene_seq_end , seq_name , seq_strand , \n#> # seq_coord_system , symbol \n#> \n#> Operation log:\n#> [2025-06-09 18:34:15] filter: removed 4 samples (50%), 4 samples remaining\n#> [2025-06-09 18:34:15] select: removed 1 (11%), 8 column(s) remaining\n#> [2025-06-09 18:34:16] mutate: added 1 new column(s): dex_upper\n#> [2025-06-09 18:34:17] extract: extracted 'dex' into column: treat (original removed)\n#> [2025-06-09 18:34:17] mutate: modified column(s): Run\n#> [2025-06-09 18:34:17] filter: removed 63676 genes (100%), 1 genes remaining\n#> [2025-06-09 18:34:18] slice: Kept 1/4 rows (25.0%); removed 3 rows\n```\n:::\n\n\n:::\n\nAs a result, the `metadata` shows a short description of the different modifications the `SummarizedExperiment` underwent during each function in the pipeline, formatted as `[TIME] FUNCTION NAME: ONE-LINE DESCRIPTION`.\n\nNotwithstanding, `omicslog` can also work with base R commands by simply adding the dataset name to the `log_start()` function:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(restore_SummarizedExperiment_show = TRUE)\n\nresult_base <- log_start(airway) # Starting the logging operations\n\nresult_base <- result_base[, colData(result_base)$dex == \"untrt\"]\ncolData(result_base)$dex_upper <- toupper(colData(result_base)$dex)\ncolData(result_base)$Run <- tolower(colData(result_base)$Run)\nresult_base <- result_base[rownames(result_base) == \"ENSG00000000003\", ]\n\nresult_base\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nclass: SummarizedExperimentLogged \ndim: 1 4 \nmetadata(1): ''\nassays(1): counts\nrownames(1): ENSG00000000003\nrowData names(0):\ncolnames(4): SRR1039508 SRR1039512 SRR1039516 SRR1039520\ncolData names(10): SampleName cell ... BioSample dex_upper\n\nOperation log:\n[2025-11-04 08:49:08] subset: removed 4 samples (50%), 4 samples remaining\n[2025-11-04 08:49:08] colData<-: added 1 new column(s): dex_upper\n[2025-11-04 08:49:08] colData<-: modified column 'Run'\n[2025-11-04 08:49:08] subset: removed 64101 genes (100%), 1 genes remaining \n```\n\n\n:::\n:::\n\n\n\n:::{.smaller}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#> class: SummarizedExperimentLogged \n#> dim: 1 4 \n#> metadata(1): ''\n#> assays(1): counts\n#> rownames(1): ENSG00000000003\n#> rowData names(10): gene_id gene_name ... seq_coord_system symbol\n#> colnames(4): SRR1039508 SRR1039512 SRR1039516 SRR1039520\n#> colData names(10): SampleName cell ... BioSample dex_upper\n#> \n#> Operation log:\n#> [2025-06-05 11:02:29] subset: removed 4 samples (50%), 4 samples remaining\n#> [2025-06-05 11:02:29] colData<-: added 1 new column(s): dex_upper\n#> [2025-06-05 11:02:29] colData<-: modified column 'Run'\n#> [2025-06-05 11:02:29] subset: removed 63676 genes (100%), 1 genes remaining\n```\n:::\n\n\n:::\n\n# We need your feedback!\n\n**Tell us your stories:** \n\n* What is your experience working with omics-oriented objects? \n* What difficulties have you faced when tracing changes across different experiments? \n* What else can we do to make your research more comfortable and easier to track?\n\nDon’t hesitate to open an issue in the [omicslog](https://github.com/tidyomics/omicslog \"logging capabilities for SummarizedExperiment objects\") GitHub repo.", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/posts/2025-11-04-introducing-omicslog/index.qmd b/posts/2025-11-04-introducing-omicslog/index.qmd index 4be1001..c633bfd 100755 --- a/posts/2025-11-04-introducing-omicslog/index.qmd +++ b/posts/2025-11-04-introducing-omicslog/index.qmd @@ -59,6 +59,28 @@ result <- result ``` +:::{.smaller} +```{r} +#> # A SummarizedExperiment-tibble abstraction: 1 × 22 +#> # Features=1 | Samples=1 | Assays=counts +#> .feature .sample counts SampleName cell treat Run avgLength +#> +#> 1 ENSG0000000… SRR103… 1138 GSM1275870 N080… untrt srr1… 120 +#> # ℹ 9 more variables: gene_name , entrezid , gene_biotype , +#> # gene_seq_start , gene_seq_end , seq_name , seq_strand , +#> # seq_coord_system , symbol +#> +#> Operation log: +#> [2025-06-09 18:34:15] filter: removed 4 samples (50%), 4 samples remaining +#> [2025-06-09 18:34:15] select: removed 1 (11%), 8 column(s) remaining +#> [2025-06-09 18:34:16] mutate: added 1 new column(s): dex_upper +#> [2025-06-09 18:34:17] extract: extracted 'dex' into column: treat (original removed) +#> [2025-06-09 18:34:17] mutate: modified column(s): Run +#> [2025-06-09 18:34:17] filter: removed 63676 genes (100%), 1 genes remaining +#> [2025-06-09 18:34:18] slice: Kept 1/4 rows (25.0%); removed 3 rows +``` +::: + As a result, the `metadata` shows a short description of the different modifications the `SummarizedExperiment` underwent during each function in the pipeline, formatted as `[TIME] FUNCTION NAME: ONE-LINE DESCRIPTION`. Notwithstanding, `omicslog` can also work with base R commands by simply adding the dataset name to the `log_start()` function: @@ -76,6 +98,25 @@ result_base <- result_base[rownames(result_base) == "ENSG00000000003", ] result_base ``` +:::{.smaller} +```{r} +#> class: SummarizedExperimentLogged +#> dim: 1 4 +#> metadata(1): '' +#> assays(1): counts +#> rownames(1): ENSG00000000003 +#> rowData names(10): gene_id gene_name ... seq_coord_system symbol +#> colnames(4): SRR1039508 SRR1039512 SRR1039516 SRR1039520 +#> colData names(10): SampleName cell ... BioSample dex_upper +#> +#> Operation log: +#> [2025-06-05 11:02:29] subset: removed 4 samples (50%), 4 samples remaining +#> [2025-06-05 11:02:29] colData<-: added 1 new column(s): dex_upper +#> [2025-06-05 11:02:29] colData<-: modified column 'Run' +#> [2025-06-05 11:02:29] subset: removed 63676 genes (100%), 1 genes remaining +``` +::: + # We need your feedback! **Tell us your stories:** From 9b1739ccbeb2e0283c49cb11faed11bbf3c179fb Mon Sep 17 00:00:00 2001 From: jdhenaos Date: Tue, 4 Nov 2025 09:56:34 +0100 Subject: [PATCH 4/6] omicslog first blog --- posts/2025-11-04-introducing-omicslog/index.qmd | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/posts/2025-11-04-introducing-omicslog/index.qmd b/posts/2025-11-04-introducing-omicslog/index.qmd index c633bfd..d43d76f 100755 --- a/posts/2025-11-04-introducing-omicslog/index.qmd +++ b/posts/2025-11-04-introducing-omicslog/index.qmd @@ -29,7 +29,7 @@ We started by enabling logging for the `SummarizedExperiment` class, powered by Let’s start with a practical example, beginning with the package installation and library loading: -```{r, echo=FALSE, messages=FALSE, warnings=FALSE} +```r if (!require("devtools", quietly = TRUE)) install.packages("devtools") @@ -42,7 +42,7 @@ library(omicslog) For this example, we worked with the `airway` dataset. To extend `tidyomics` with `omicslog`, it is only necessary to add the `log_start()` function before applying the different filtering criteria: -```{r} +```r data(airway, package="airway") result <- @@ -60,7 +60,7 @@ result ``` :::{.smaller} -```{r} +```r #> # A SummarizedExperiment-tibble abstraction: 1 × 22 #> # Features=1 | Samples=1 | Assays=counts #> .feature .sample counts SampleName cell treat Run avgLength @@ -85,7 +85,7 @@ As a result, the `metadata` shows a short description of the different modificat Notwithstanding, `omicslog` can also work with base R commands by simply adding the dataset name to the `log_start()` function: -```{r} +```r options(restore_SummarizedExperiment_show = TRUE) result_base <- log_start(airway) # Starting the logging operations @@ -99,7 +99,7 @@ result_base ``` :::{.smaller} -```{r} +```r #> class: SummarizedExperimentLogged #> dim: 1 4 #> metadata(1): '' From 4565fef5d8100bfcc59e963d852fc13fa5f44046 Mon Sep 17 00:00:00 2001 From: Stefano Mangiola Date: Tue, 4 Nov 2025 20:42:15 +1030 Subject: [PATCH 5/6] Update posts/2025-11-04-introducing-omicslog/index.qmd Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- posts/2025-11-04-introducing-omicslog/index.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/posts/2025-11-04-introducing-omicslog/index.qmd b/posts/2025-11-04-introducing-omicslog/index.qmd index d43d76f..5e29081 100755 --- a/posts/2025-11-04-introducing-omicslog/index.qmd +++ b/posts/2025-11-04-introducing-omicslog/index.qmd @@ -51,7 +51,7 @@ result <- filter(dex == "untrt") |> select(!albut) |> mutate(dex_upper = toupper(dex)) |> - extract(col = dex,into = "treat") |> + extract(col = dex, into = "treat") |> mutate(Run = tolower(Run)) |> filter(.feature == "ENSG00000000003") |> slice(3) From 1aab2b5d36e06877786e43de0755ed1a22bcf795 Mon Sep 17 00:00:00 2001 From: Stefano Mangiola Date: Tue, 4 Nov 2025 20:42:27 +1030 Subject: [PATCH 6/6] Update posts/2025-11-04-introducing-omicslog/index.qmd Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- posts/2025-11-04-introducing-omicslog/index.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/posts/2025-11-04-introducing-omicslog/index.qmd b/posts/2025-11-04-introducing-omicslog/index.qmd index 5e29081..57b2948 100755 --- a/posts/2025-11-04-introducing-omicslog/index.qmd +++ b/posts/2025-11-04-introducing-omicslog/index.qmd @@ -43,7 +43,7 @@ library(omicslog) For this example, we worked with the `airway` dataset. To extend `tidyomics` with `omicslog`, it is only necessary to add the `log_start()` function before applying the different filtering criteria: ```r -data(airway, package="airway") +data(airway, package = "airway") result <- airway |>