This repository contains code for the paper, to appear at IEEE ISIT 2023:
"Efficiently Computing Sparse Fourier Transforms of
*Equal contribution: These authors contributed equally.
Check out our NEW Youtube video HERE
This package may be useful to you if you deal with complicated functions of
Fourier transformations of pseudo-Boolean functions are popular tools for analyzing functions of binary sequences. Real-world functions often have structures that manifest in a sparse Fourier transform, and previous works have shown that under the assumption of sparsity the transform can be computed efficiently. But what if we want to compute the Fourier transform of functions defined over a
The main functionality of our algorithm is available in the QSFT class. A minimal example can be found in
synt_exp/quick_example.py. Details on how this file works can be found in other sections of the README.
In this section, we discuss the Signal objects that we use to interface with the QSFT class.
A Signal encapsulates the object that we are trying to transform (you may interpret it as a signal of length SubsampledSignal class found at qsft.input_signal_subsampled.SubsampledSignal. This class can be extended to
easily create a signal for the specific application that we desire. For example, we create a
synthetic signal that is sparse in the Fourier domain in
synt_exp.synt_src.synthetic_signal.SyntheticSparseSignal. The subsample() function must be implemented in the
extended class. This function takes a list of query_indicies and outputs a list of fuction/signal value at the given
query indicies. We refer to the SyntheticSparseSignal as an example.
We can construct a SyntheticSparseSignal as follows. First, we need to declare the query_args:
query_args = {
"subsampling_method": "qsft",
"query_method": "complex",
"num_subsample": num_subsample,
"b": b,
"delays_method_source": "identity",
"delays_method_channel": "nso",
"num_repeat": num_repeat,
}Let's break this down.
-
subsampling_methodshould be set toqsftif we plan to use theQSFTclass, otherwise it should be set tolassoif LASSO will be used. -
The
query_methodargument is set to "complex", which sets our subsampling matricies$M_c$ to be generated randomly. This works very well in practice, in particular for situations where you do not expect the Fourier coefficients to be uniformly distributed. Alternately, setting this argument to "simple" will generate$M_c$ according to the identity matrix structure in our paper, which works provably well when fourier coefficients. -
The
num_subsampleparameter sets$C$ , the number of different matricies$M_c, ;c=1,\dotsc,C$ that are used. A good place to start is$C=3$ , but you can adjust it later to potentially improve performance. -
bdetermines the inner dimension of the subsampling. This parameter must be chosen such that the number of non-zero coefficients is$O(q^b)$ . If you don't know the sparsity of your signal, you may have you adjustbuntil you find a suitable value. Note that you don't want to makebtoo large either, as that will increase your sample and computational complexity. -
The
delays_method_sourceparameter is set to "identity". In general, this should be set to "identity", unless you know that the max hamming weight of the non-zero Fourier coefficients are low (i.e., the Fourier transform is low degree). This will use$n$ delays. If you know, however, that the max hamming weight (i.e., degree) of non-zero Fourier coefficients is lower than$t$ and$q$ is prime, then you can use the "coded" setting, which uses only$2t \log_q n$ delays instead, a significant improvement when$n$ is large. This is often the case when the function you are dealing with represents some real-world function. -
If you set
delays_method_sourceto "coded", you must also include thetparameter. TheQSFTclass reports the max hamming weight of non-zero coefficients, so if you find that they are constantly low, consider enabling this for a significant speedup.
With query_args set, we can now construct our signal object. To do so, we call the get_random_subsampled_signal,
which randomly generates a SyntheticSubsampledSingal for us.
test_signal = get_random_subsampled_signal( n=n,
q=q,
sparsity=sparsity,
a_min=a_min,
a_max=a_max,
noise_sd=noise_sd,
query_args=query_args,
max_weight=t)Some parameters are explained below:
-
n,qrepresent the number of function inputs and alphabet size respectively (for interpretation as a signal this is a signal with$q^n$ elements). -
sparsityis the number of non-zero coefficients that should be in the transform -
a_minanda_maxare the minimum and maximum modulus of the nonzero coefficients, which is chosen uniformly over this range. -
noise_sdis the stander deviation of the additive noise added to the signal. -
max_weight(optional) is the max weight of non-zero Fourier coefficients in the generated signal. The set of indicies for the non-zero Fourier coefficients are chosen uniformly over all indicies with hamming weightmax_weightor less. In general, you probably wantmax_weightto be equal totin query_args, since settingtensures you only look for coefficients with indicies of weighttor less.
Now that we have a signal object, the next step is to take its transform!
Once we construct the signal we want to transform, the next step is to create the QSFT object that will perform the transformation. Again, we start with the key arguments for
qsft_args = {
"num_subsample": num_subsample,
"num_repeat": num_repeat,
"reconstruct_method_source": delays_method_source,
"reconstruct_method_channel": delays_method_channel,
"b": b,
"noise_sd": noise_sd,
"source_decoder": decoder
}-
num_subsample,num_repeat, andbare similar to the equivalent parameters for the signal object. a QSFT instance may only be used on a singal if its corresponding parameters are leger or equal. For example, if we have a signal withnum_subsample = 3, we can setnum_subsampleto be any value$\leq 3$ . -
delays_method_sourceanddelays_method_channelmust exactly match those of the signal you intend to use with the QSFT instance. Ifdelays_method_source = "coded", you must also pass a function handlesource_decoder. We have implemented a function that returns a suitable Reed Solomon decoder inget_reed_solomon_dec. -
noise_sdis a hyperparameter that is a proxy for the amount of additive noise in the signal. If the signal is truly corrupted by additive gaussian noise, using the variance of that noise is a good choice for the is parameter, otherwise, in a real-world setting you may have to adjust this to find a suitable value. We can then use these values to create an instance ofQSFT.
sft = QSFT(**qsft_args)
result = sft.transform(test_signal, verbosity=0, timing_verbose=True, report=True, sort=True)- The
verbosityargument determines the amount of printouts, not including timing information. When it is set to 0 there are no printouts, when it is set to 10, the maximum number of printouts are provided. - When
timing_verboseisTrueinformation about how long each step of transform took is included. - When
reportis set toFalseonly the transform is output, when it is set toTrue, a collection of useful statistics are included. The docstring of theQSFTclass contains more information about what is included in the output in this case. Insynt_exp/quick_example.pyan example is provided where the additional information is processed and displayed.
The TestHelper is an abstract class used to encapsulate the complete pipeline of sampling, data storage, data loading and sparse Fourier transformation.
It contains a single abstract method generate_signal that needs to be overriden when inheriting TestHelper.
The only argument of the generate_signal method is the dictionary signal_args that is provided to the helper object at object creation.
The generate_signal method needs to be implemented such that for a given signal_args dictionary, it returns the corresponding Signal object.
For instance, the SynthethicHelper class inherits TestHelper and overrides the generate_signal method as follows.
from qsft.test_helper import TestHelper
from synt_exp.synt_src.synthetic_signal import SyntheticSubsampledSignal
class SyntheticHelper(TestHelper):
def generate_signal(self, signal_args):
return SyntheticSubsampledSignal(**signal_args)Then a SyntheticHelper object needs be created with following arguments:
TestHelper(signal_args,
methods,
subsampling_args,
test_args,
exp_dir)Here, the arguments are as follows:
signal_argsargument is directly provided togenerate_signalmethod and used to generateSignalobjects.- The
methodsargument is a list of Strings that determines which algorithms are going to be used with the helper object. Possible options are"qsft","coded_qsft"and"lasso". - The
subsampling_argsargument is a dictionary that containsnum_subsample(number of different subsampling matrices),num_repeat(number of repetitions in coding),b(inner dimension of subsampling). - The
test_argsargument is a dictionary that containsn_samples(number of test samples). - The
exp_dirargument is an optional argument that specifies the directory to save the samples and sub-transforms for later usage. If provided directory contains previously computed samples and sub-transforms, they are loaded instead of computing again.
For instance, the following code creates a SyntheticHelper object
methods = ["qsft"]
subsampling_args = {
"num_subsample": 5,
"num_repeat": 3,
"b": 7,
}
test_args = { "n_samples": 200000 }
helper = SyntheticHelper(signal_args, methods, subsampling_args, test_args, exp_dir)At the time of object creation, the signal object is generated and subsampled. To compute the model using samples, we call compute_model method with arguments
method: algorithm to be used. Possible options are"qsft","coded_qsft"and"lasso".model_kwargs: Ifmethodis"qsft"or"coded_qsft", it needs to be a dictionary with fields"num_subsample","num_repeat","b", and"noise_sd"(standard deviation of the noise, it is used to determine the threshold for bin identification). The values for"num_subsample","num_repeat","b"must be less than or equal to the values insignal_argsprovided to theTestHelperobject at the time of creation. Even if sampling is done for larger values, we can compute the models for lower values of these arguments using a subset of the samples. Ifmethodis"lasso", it needs to be a dictionary with fields"n_samples"(the number of uniformly chosen samples) and"noise_sd".
For instance, we can run
method = "qsft"
model_kwargs = {
"num_subsample": 2,
"num_repeat": 2,
"b": 7,
"noise_sd": 0.01
}
helper.compute_model(method, model_kwargs)
In addition to implementing QSFT, we also include a comparison with LASSO implemented via
group-lasso, which is significantly slower for this application.
The following figures compare LASSO and QSFT. These figures were generated by
using the scripts at synt_exp/run-tests-complexity-vs-size.py and plotted by synt_exp/plot-complexity-vs-size.py.
The grey area in the first graph is a region where LASSO took too long to converge.
As we can see, the runtime of
As the amount of noise in the signal/function increases, sucessful recover becomes more difficult. To examine this
phenomonon, the script synt_exp/run-tests-nmse-vs-snr.py is useful. In graph below, we see that for different
sparsity levels $S QSFT goes from a very high to low NMSE at some threshold. This type of phase transtion
behaviour is tpyical in compressed sensing.
This repository also provide an example of how to apply our code to a complex
ViennaRNA.
Code for this example is in the rna_exp folder. We create the RnaSubsampledSignal(SubsampledSignal) Class. The
subsample(self, query_indices) function interfaces with the ViennaRNA package, to compute the Mean Free Energy (MFE)
of an RNA sequence.
The graph above shows that when QSFT function achieves a low NMSE. This means that QSFT
generates a sparse fourier transform that is able to compute the MFE of an arbitrary unseen RNA sequence with
relatively little error.




