Major updates
-
Substantially updated and optimized the
exirfunction for improved scalability, flexibility, and usability across bulk and single-cell omics datasets. -
Updated
exirto accept experimental data in multiple formats, including data frames, tibbles, matrices, sparse matrices, and Seurat objects. -
Updated the expected non-Seurat experimental data format for
exirto support the common omics layout, with features/genes in rows and samples/cells in columns. Internally,exirautomatically converts the input to the required analysis format. -
Replaced the previous
Condition_colnameworkflow with the more flexibleconditionargument. Theconditionargument can now be either a condition row/column name or a character/factor vector with the same order as the samples/cells in the input data. For Seurat objects,conditioncan be the name of a metadata column. -
Added Seurat object support to
exirvia the newassayandlayerarguments. -
Added the
Exptl_data_typeargument to specify whether the experimental data are"bulk"or"sc", enabling data-type-aware preprocessing, normalization, pseudo-sampling, and warnings. -
Restored and redesigned the
normalizeargument. For bulk count-like data,normalize = TRUEnow applies TMM normalization followed by logCPM transformation usingedgeR. For single-cell data, normalization is applied after pseudo-bulk aggregation when pseudo-sampling is enabled. -
Added pseudo-sampling/pseudo-bulking support to
exirthrough the newpseudo_sampleandpseudo_samples_per_grouparguments. This is particularly useful for large datasets and single-cell RNA-seq data. -
Implemented condition-stratified, non-overlapping pseudo-sampling. For bulk data, pseudo-samples are generated by averaging normalized expression values within condition-specific groups. For single-cell data, pseudo-bulk samples are generated by summing raw counts within condition-specific groups followed by TMM/logCPM normalization using
edgeR. -
Added the
Exptl_data_size_checkargument to optionally prompt users to consider pseudo-sampling when the number of samples/cells is large. -
Added conservative feature filtering to
exirthrough the newfeature_filter,min_feature_prevalence,min_feature_total,min_feature_variance, andalways_keep_diff_featuresarguments. This filter removes uninformative features with insufficient prevalence, total signal, or variance without performing highly variable gene selection. Features inDiff_dataandDesired_listcan be forced to remain in the analysis.
Performance improvements
-
Optimized
exirdata preparation to delay dense conversion of sparse input data until after optional pseudo-sampling and feature filtering, reducing memory pressure for large omics datasets. -
Optimized PCA in
exirby replacing full PCA with truncated PCA usingirlba::prcomp_irlbafor the first principal component. -
Optimized the correlation table handling in
exirto avoid unnecessary full-table duplication while preserving the original association analysis logic and output. -
Optimized graph reconstruction in
exirby removing unintended self-loops while preserving multiple edges. -
Optimized neighbourhood score calculation in
exirby replacing row-by-rowigraph::neighbors()calls with sparse adjacency matrix multiplication. -
Vectorized row-wise scoring and classification operations in
exir, including primitive driver score calculation and driver/biomarker type assignment. -
Optimized the extraction of first- and second-order associated drivers for mediator tables by replacing repeated regex-based
grep()searches with batched neighbourhood retrieval and set-based matching. -
Optimized several IVI-related routines while preserving output consistency with the original implementation.
-
Optimized
clusterRankby avoiding repeated degree calculations and reducing redundant graph traversal. -
Optimized
lh_indexandh_indexby precomputing repeated neighbourhood-size and H-index components where possible. -
Optimized
neighborhood.connectivityby precomputing first-order neighbourhood sizes and reducing repeated calls toigraph::neighborhood.size. -
Optimized
collective.influencewhile preserving identical output.
Usability and documentation
-
Replaced older verbose output in
exirwith cleanercli-based progress and stage reporting. -
Updated error and warning messages in the
exirworkflow usingclifor clearer user-facing feedback. -
Updated the ExIR vignette to document the new input formats, data orientation, condition handling, normalization options, pseudo-sampling workflow, Seurat support, and conservative feature filtering.
-
Updated
exirdocumentation to clarify recommended input requirements for bulk and single-cell data. -
Updated
exirdocumentation to clarify that TMM/logCPM normalization is appropriate for many bulk RNA-seq count datasets but may not be appropriate for all omics data modalities. -
Updated examples for the new
condition,Exptl_data_type,Exptl_data_orientation,normalize,pseudo_sample, andfeature_filterworkflows.
Dependency updates
-
Added
edgeRfor TMM/logCPM normalization of count-like bulk data and pseudo-bulked single-cell data. -
Added use of
clifor improved messages and progress reporting. -
Added use of
Matrixutilities for sparse matrix handling and efficient graph/neighbourhood calculations. -
Added optional Seurat object support through
SeuratObject.
Bug fixes and minor improvements
-
Improved handling of missing values in experimental data during preprocessing.
-
Improved detection of count-like, sparse, bulk-like, and single-cell-like input data characteristics.
-
Improved validation of Seurat assays, layers, and metadata-derived condition labels.
-
Improved validation of pseudo-sampling settings, including condition-specific sample/cell counts.
-
Improved memory cleanup after full correlation table reduction in
exir. -
Improved consistency of output table preparation after vectorized score/type calculations.