forked from Bioconductor/biocblog
-
Notifications
You must be signed in to change notification settings - Fork 2
Omicslog #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
stemangiola
wants to merge
7
commits into
main
Choose a base branch
from
omicslog
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+143
−0
Open
Omicslog #7
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
5af7885
omicslog first blog
jdhenaos 3ffa942
omicslog first blog
jdhenaos 63d95f1
omicslog first blog
jdhenaos 9b1739c
omicslog first blog
jdhenaos 216aca7
Merge pull request #6 from jdhenaos/omicslog
stemangiola 4565fef
Update posts/2025-11-04-introducing-omicslog/index.qmd
stemangiola 1aab2b5
Update posts/2025-11-04-introducing-omicslog/index.qmd
stemangiola File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
15 changes: 15 additions & 0 deletions
15
_freeze/posts/2025-11-04-introducing-omicslog/index/execute-results/html.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| { | ||
| "hash": "ffe1ebd855c82caa38590a1c1538e180", | ||
| "result": { | ||
| "engine": "knitr", | ||
| "markdown": "---\ntitle: \"Omicslog\"\nauthor: \"Juan Henao\"\ndate: \"2025-11-04\"\npackage: tidyomics\ntags:\n - tidyomics/tidyomicsBlog\n - logging\n - tidyverse\n - bioconductor\ndescription: \"Providing logging capabilities for SummarizedExperiment objects.\"\nformat:\n html:\n toc: true\n toc-float: true\nexecute:\n freeze: true\n---\n\n\n\n# Welcome to omicslog!\n\nI still remember being in front of my PI, trying to recall, or even worse, to guess the number of samples we had ignored when running a specific analysis, such as filtering low-count genes for DEG analysis or excluding biological samples collected outside a target time window when we aimed to discover biomarker candidates for early disease detection. I especially remember how that became even worse across the different projects we were working on simultaneously.\n\nThe solution was always the same: rerun the whole code up to the line that could answer those questions. That was even more frustrating considering that, in many cases, those questions came from pure curiosity rather than information that would be included in the final publication.\n\nInspired by the `lab notebook` from my wet lab colleagues and the `tidylog` package, we present `omicslog`, a package that provides logging capabilities for omics-oriented objects. Our goal is to establish a standard for tracking changes to these objects, acting as an automated dry lab notebook and improving the reproducibility of specific analyses.\n\nWe started by enabling logging for the `SummarizedExperiment` class, powered by `tidyomics` functionalities. Every function in the pipeline is evaluated, with changes traced and aggregated as metadata.\n\nLet’s start with a practical example, beginning with the package installation and library loading:\n\n\n\n::: {.cell messages='false' warnings='false'}\n::: {.cell-output .cell-output-stderr}\n\n```\nSkipping install of 'omicslog' from a github remote, the SHA1 (eeb57643) has not changed since last install.\n Use `force = TRUE` to force installation\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: MatrixGenerics\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: matrixStats\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'MatrixGenerics'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:matrixStats':\n\n colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,\n colCounts, colCummaxs, colCummins, colCumprods, colCumsums,\n colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,\n colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,\n colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,\n colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,\n colWeightedMeans, colWeightedMedians, colWeightedSds,\n colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,\n rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,\n rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,\n rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,\n rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,\n rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,\n rowWeightedMads, rowWeightedMeans, rowWeightedMedians,\n rowWeightedSds, rowWeightedVars\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: GenomicRanges\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: stats4\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: BiocGenerics\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'BiocGenerics'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:stats':\n\n IQR, mad, sd, var, xtabs\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:base':\n\n anyDuplicated, aperm, append, as.data.frame, basename, cbind,\n colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,\n get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,\n match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,\n Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,\n table, tapply, union, unique, unsplit, which.max, which.min\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: S4Vectors\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'S4Vectors'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:base':\n\n expand.grid, I, unname\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: IRanges\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: GenomeInfoDb\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nLoading required package: Biobase\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nWelcome to Bioconductor\n\n Vignettes contain introductory material; view with\n 'browseVignettes()'. To cite Bioconductor, see\n 'citation(\"Biobase\")', and for packages 'citation(\"pkgname\")'.\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'Biobase'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:MatrixGenerics':\n\n rowMedians\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following objects are masked from 'package:matrixStats':\n\n anyMissing, rowMedians\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\n\nAttaching package: 'tidySummarizedExperiment'\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:IRanges':\n\n slice\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:S4Vectors':\n\n rename\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:matrixStats':\n\n count\n```\n\n\n:::\n\n::: {.cell-output .cell-output-stderr}\n\n```\nThe following object is masked from 'package:stats':\n\n filter\n```\n\n\n:::\n:::\n\n\n\nFor this example, we worked with the `airway` dataset. To extend `tidyomics` with `omicslog`, it is only necessary to add the `log_start()` function before applying the different filtering criteria:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata(airway, package=\"airway\")\n\nresult <- \n airway |>\n log_start() |> # Starting the logging operations\n filter(dex == \"untrt\") |>\n select(!albut) |>\n mutate(dex_upper = toupper(dex)) |>\n extract(col = dex,into = \"treat\") |>\n mutate(Run = tolower(Run)) |> \n filter(.feature == \"ENSG00000000003\") |>\n slice(3)\n\nresult\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A SummarizedExperiment-tibble abstraction: 1 × 1\n# \u001b[90mFeatures=1 | Samples=1 | Assays=counts\u001b[0m\n .feature .sample counts SampleName cell treat Run avgLength Experiment\n <chr> <chr> <int> <fct> <fct> <chr> <chr> <int> <fct> \n1 ENSG00000000… SRR103… 1138 GSM1275870 N080… untrt srr1… 120 SRX384353 \n# ℹ 3 more variables: Sample <fct>, BioSample <fct>, dex_upper <chr>\n\nOperation log:\n[2025-11-04 08:49:06] filter: removed 4 samples (50%), 4 samples remaining\n[2025-11-04 08:49:06] select: removed 1 (11%), 8 column(s) remaining\n[2025-11-04 08:49:07] mutate: added 1 new column(s): dex_upper\n[2025-11-04 08:49:07] extract: extracted 'dex' into column: treat (original removed)\n[2025-11-04 08:49:08] mutate: modified column(s): Run\n[2025-11-04 08:49:08] filter: removed 64101 genes (100%), 1 genes remaining\n[2025-11-04 08:49:08] slice: Kept 1/4 rows (25.0%); removed 3 rows \n```\n\n\n:::\n:::\n\n\n\n:::{.smaller}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#> # A SummarizedExperiment-tibble abstraction: 1 × 22\n#> # Features=1 | Samples=1 | Assays=counts\n#> .feature .sample counts SampleName cell treat Run avgLength \n#> <chr> <chr> <int> <fct> <fct> <chr> <chr> <int> \n#> 1 ENSG0000000… SRR103… 1138 GSM1275870 N080… untrt srr1… 120 \n#> # ℹ 9 more variables: gene_name <chr>, entrezid <int>, gene_biotype <chr>, \n#> # gene_seq_start <int>, gene_seq_end <int>, seq_name <chr>, seq_strand <int>, \n#> # seq_coord_system <int>, symbol <chr>\n#> \n#> Operation log:\n#> [2025-06-09 18:34:15] filter: removed 4 samples (50%), 4 samples remaining\n#> [2025-06-09 18:34:15] select: removed 1 (11%), 8 column(s) remaining\n#> [2025-06-09 18:34:16] mutate: added 1 new column(s): dex_upper\n#> [2025-06-09 18:34:17] extract: extracted 'dex' into column: treat (original removed)\n#> [2025-06-09 18:34:17] mutate: modified column(s): Run\n#> [2025-06-09 18:34:17] filter: removed 63676 genes (100%), 1 genes remaining\n#> [2025-06-09 18:34:18] slice: Kept 1/4 rows (25.0%); removed 3 rows\n```\n:::\n\n\n:::\n\nAs a result, the `metadata` shows a short description of the different modifications the `SummarizedExperiment` underwent during each function in the pipeline, formatted as `[TIME] FUNCTION NAME: ONE-LINE DESCRIPTION`.\n\nNotwithstanding, `omicslog` can also work with base R commands by simply adding the dataset name to the `log_start()` function:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\noptions(restore_SummarizedExperiment_show = TRUE)\n\nresult_base <- log_start(airway) # Starting the logging operations\n\nresult_base <- result_base[, colData(result_base)$dex == \"untrt\"]\ncolData(result_base)$dex_upper <- toupper(colData(result_base)$dex)\ncolData(result_base)$Run <- tolower(colData(result_base)$Run)\nresult_base <- result_base[rownames(result_base) == \"ENSG00000000003\", ]\n\nresult_base\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nclass: SummarizedExperimentLogged \ndim: 1 4 \nmetadata(1): ''\nassays(1): counts\nrownames(1): ENSG00000000003\nrowData names(0):\ncolnames(4): SRR1039508 SRR1039512 SRR1039516 SRR1039520\ncolData names(10): SampleName cell ... BioSample dex_upper\n\nOperation log:\n[2025-11-04 08:49:08] subset: removed 4 samples (50%), 4 samples remaining\n[2025-11-04 08:49:08] colData<-: added 1 new column(s): dex_upper\n[2025-11-04 08:49:08] colData<-: modified column 'Run'\n[2025-11-04 08:49:08] subset: removed 64101 genes (100%), 1 genes remaining \n```\n\n\n:::\n:::\n\n\n\n:::{.smaller}\n\n\n::: {.cell}\n\n```{.r .cell-code}\n#> class: SummarizedExperimentLogged \n#> dim: 1 4 \n#> metadata(1): ''\n#> assays(1): counts\n#> rownames(1): ENSG00000000003\n#> rowData names(10): gene_id gene_name ... seq_coord_system symbol\n#> colnames(4): SRR1039508 SRR1039512 SRR1039516 SRR1039520\n#> colData names(10): SampleName cell ... BioSample dex_upper\n#> \n#> Operation log:\n#> [2025-06-05 11:02:29] subset: removed 4 samples (50%), 4 samples remaining\n#> [2025-06-05 11:02:29] colData<-: added 1 new column(s): dex_upper\n#> [2025-06-05 11:02:29] colData<-: modified column 'Run'\n#> [2025-06-05 11:02:29] subset: removed 63676 genes (100%), 1 genes remaining\n```\n:::\n\n\n:::\n\n# We need your feedback!\n\n**Tell us your stories:** \n\n* What is your experience working with omics-oriented objects? \n* What difficulties have you faced when tracing changes across different experiments? \n* What else can we do to make your research more comfortable and easier to track?\n\nDon’t hesitate to open an issue in the [omicslog](https://github.com/tidyomics/omicslog \"logging capabilities for SummarizedExperiment objects\") GitHub repo.", | ||
| "supporting": [], | ||
| "filters": [ | ||
| "rmarkdown/pagebreak.lua" | ||
| ], | ||
| "includes": {}, | ||
| "engineDependencies": {}, | ||
| "preserve": {}, | ||
| "postProcess": true | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,128 @@ | ||
| --- | ||
| title: "Omicslog" | ||
| author: "Juan Henao" | ||
| date: "2025-11-04" | ||
| package: tidyomics | ||
| tags: | ||
| - tidyomics/tidyomicsBlog | ||
| - logging | ||
| - tidyverse | ||
| - bioconductor | ||
| description: "Providing logging capabilities for SummarizedExperiment objects." | ||
| format: | ||
| html: | ||
| toc: true | ||
| toc-float: true | ||
| execute: | ||
| freeze: true | ||
| --- | ||
|
|
||
| # Welcome to omicslog! | ||
|
|
||
| I still remember being in front of my PI, trying to recall, or even worse, to guess the number of samples we had ignored when running a specific analysis, such as filtering low-count genes for DEG analysis or excluding biological samples collected outside a target time window when we aimed to discover biomarker candidates for early disease detection. I especially remember how that became even worse across the different projects we were working on simultaneously. | ||
|
|
||
| The solution was always the same: rerun the whole code up to the line that could answer those questions. That was even more frustrating considering that, in many cases, those questions came from pure curiosity rather than information that would be included in the final publication. | ||
|
|
||
| Inspired by the `lab notebook` from my wet lab colleagues and the `tidylog` package, we present `omicslog`, a package that provides logging capabilities for omics-oriented objects. Our goal is to establish a standard for tracking changes to these objects, acting as an automated dry lab notebook and improving the reproducibility of specific analyses. | ||
|
|
||
| We started by enabling logging for the `SummarizedExperiment` class, powered by `tidyomics` functionalities. Every function in the pipeline is evaluated, with changes traced and aggregated as metadata. | ||
|
|
||
| Let’s start with a practical example, beginning with the package installation and library loading: | ||
|
|
||
| ```r | ||
| if (!require("devtools", quietly = TRUE)) | ||
| install.packages("devtools") | ||
|
|
||
| devtools::install_github("tidyomics/omicslog") | ||
|
|
||
| library(SummarizedExperiment) | ||
| library(tidySummarizedExperiment) | ||
| library(omicslog) | ||
| ``` | ||
|
|
||
| For this example, we worked with the `airway` dataset. To extend `tidyomics` with `omicslog`, it is only necessary to add the `log_start()` function before applying the different filtering criteria: | ||
|
|
||
| ```r | ||
| data(airway, package = "airway") | ||
|
|
||
| result <- | ||
| airway |> | ||
| log_start() |> # Starting the logging operations | ||
| filter(dex == "untrt") |> | ||
| select(!albut) |> | ||
| mutate(dex_upper = toupper(dex)) |> | ||
| extract(col = dex, into = "treat") |> | ||
| mutate(Run = tolower(Run)) |> | ||
| filter(.feature == "ENSG00000000003") |> | ||
| slice(3) | ||
|
|
||
| result | ||
| ``` | ||
|
|
||
| :::{.smaller} | ||
| ```r | ||
| #> # A SummarizedExperiment-tibble abstraction: 1 × 22 | ||
| #> # Features=1 | Samples=1 | Assays=counts | ||
| #> .feature .sample counts SampleName cell treat Run avgLength | ||
| #> <chr> <chr> <int> <fct> <fct> <chr> <chr> <int> | ||
| #> 1 ENSG0000000… SRR103… 1138 GSM1275870 N080… untrt srr1… 120 | ||
| #> # ℹ 9 more variables: gene_name <chr>, entrezid <int>, gene_biotype <chr>, | ||
| #> # gene_seq_start <int>, gene_seq_end <int>, seq_name <chr>, seq_strand <int>, | ||
| #> # seq_coord_system <int>, symbol <chr> | ||
| #> | ||
| #> Operation log: | ||
| #> [2025-06-09 18:34:15] filter: removed 4 samples (50%), 4 samples remaining | ||
| #> [2025-06-09 18:34:15] select: removed 1 (11%), 8 column(s) remaining | ||
| #> [2025-06-09 18:34:16] mutate: added 1 new column(s): dex_upper | ||
| #> [2025-06-09 18:34:17] extract: extracted 'dex' into column: treat (original removed) | ||
| #> [2025-06-09 18:34:17] mutate: modified column(s): Run | ||
| #> [2025-06-09 18:34:17] filter: removed 63676 genes (100%), 1 genes remaining | ||
| #> [2025-06-09 18:34:18] slice: Kept 1/4 rows (25.0%); removed 3 rows | ||
| ``` | ||
| ::: | ||
|
|
||
| As a result, the `metadata` shows a short description of the different modifications the `SummarizedExperiment` underwent during each function in the pipeline, formatted as `[TIME] FUNCTION NAME: ONE-LINE DESCRIPTION`. | ||
|
|
||
| Notwithstanding, `omicslog` can also work with base R commands by simply adding the dataset name to the `log_start()` function: | ||
|
|
||
| ```r | ||
| options(restore_SummarizedExperiment_show = TRUE) | ||
|
|
||
| result_base <- log_start(airway) # Starting the logging operations | ||
|
|
||
| result_base <- result_base[, colData(result_base)$dex == "untrt"] | ||
| colData(result_base)$dex_upper <- toupper(colData(result_base)$dex) | ||
| colData(result_base)$Run <- tolower(colData(result_base)$Run) | ||
| result_base <- result_base[rownames(result_base) == "ENSG00000000003", ] | ||
|
|
||
| result_base | ||
| ``` | ||
|
|
||
| :::{.smaller} | ||
| ```r | ||
| #> class: SummarizedExperimentLogged | ||
| #> dim: 1 4 | ||
| #> metadata(1): '' | ||
| #> assays(1): counts | ||
| #> rownames(1): ENSG00000000003 | ||
| #> rowData names(10): gene_id gene_name ... seq_coord_system symbol | ||
| #> colnames(4): SRR1039508 SRR1039512 SRR1039516 SRR1039520 | ||
| #> colData names(10): SampleName cell ... BioSample dex_upper | ||
| #> | ||
| #> Operation log: | ||
| #> [2025-06-05 11:02:29] subset: removed 4 samples (50%), 4 samples remaining | ||
| #> [2025-06-05 11:02:29] colData<-: added 1 new column(s): dex_upper | ||
| #> [2025-06-05 11:02:29] colData<-: modified column 'Run' | ||
| #> [2025-06-05 11:02:29] subset: removed 63676 genes (100%), 1 genes remaining | ||
| ``` | ||
| ::: | ||
|
|
||
| # We need your feedback! | ||
|
|
||
| **Tell us your stories:** | ||
|
|
||
| * What is your experience working with omics-oriented objects? | ||
| * What difficulties have you faced when tracing changes across different experiments? | ||
| * What else can we do to make your research more comfortable and easier to track? | ||
|
|
||
| Don’t hesitate to open an issue in the [omicslog](https://github.com/tidyomics/omicslog "logging capabilities for SummarizedExperiment objects") GitHub repo. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.