Previous topic | Next topic
Author Message
PostPosted: Wed Oct 17, 2012 1:13 pm 
Hi,
Thanks for allowing GREAT to be invoked via command line.
I am finding that when I run a job without a background file, the time taken is far less than when I use a background. Is this expected behaviour?

More worryingly, when I *don't* use a bg file, the results file ('batch' mode, generating tsv) is ~2-3M in size. But when I do use a bg file, the tsv is ~99Mb ! I ran a test where I tried both using the same foreground, and opened the resulting tsv file in Excel. Pvalues are different suggesting that background information is changing the ORA results. But why is the file so large? Any way I can make it smaller?

Here is the command I use with background file (printed in R):
[1] "wget -O STEM2GREAT_tmp/10K/STEMoutput/GREAT_output//Liver_10K_121016_3PM_150_2_profile120.mm9.GREATresults.tsv \"http://bejerano.stanford.edu/great/public/cgi-bin/greatStart.php?outputType=batch&requestSpecies=mm9&requestName=Example+Data&requestSender=Pai+S&requestURL=<baseUrldeleted>Liver_10K_121016_3PM_150_2_profile120.mm9&bgURL=<baseUrldeleted>Liver_10K_121016_3PM_150_2_ALLPROBES.mm9\""

Second question: Is there any way to change rules for region-gene association, using this method of GREAT invocation?

Thanks in advance,
Shraddha
Report this post   
Reply with quote  
 
 Post subject:
PostPosted: Fri Oct 19, 2012 10:57 am 
Site Admin
Hi Shraddha,

For your question 1:
It is not entirely unexpected for the fg/bg test to take longer than the whole genome background. There are many reasons why the it may take longer. One simple one is that in the fg/bg test, GREAT needs to download two files and if the background file is large it can take a while.

The results file can also be larger since when GREAT output the results, it will include all the background regions that may be associated with a certain term (again if the background is large, then GREAT will be outputting a lot of data). We currently do not provide compressed output (we will consider adding this in a future version), so you will have to compress the results on your end to save space.

If you think there may be something wrong and want us to investigate this further please submit one of your jobs to the web interface and send us your job id (found under Job Description section).

For your question 2:
We currently don't support a way to change rules for region-gene association using the programming interface. We will also consider adding this feature in a future version.

Thanks,
Bejerano Team
Report this post   
Reply with quote  
 
PostPosted: Fri Nov 23, 2012 8:00 am 
Hi,
I'm writing with a request for the tsv output for GREAT when used via command-line (using wget, 'batch' mode).
Currently the output does not display the number of foreground and background entries. I cannot see this information either in the first few or last few lines of the tsv file (using "tail -n 30" and "head -n 5").

It would be useful information for downstream analysis scripts; all information pertaining to the run would be in that single file.

If I'm mistaken and the information is easily extractable from the file, please let me know.

Thanks,
Shraddha
Report this post   
Reply with quote  
 
 Post subject:
PostPosted: Wed Dec 05, 2012 3:54 pm 
Site Admin
Hi Shraddha,

You are correct. GREAT currently does not output such meta data. We will consider adding this in a future release. For now, we would suggest doing this as a post processing step for the output. Since GREAT's batch mode provides text output, concatenating this information to the files should be pretty straightforward.

Thanks,
Bejerano Team
Report this post   
Reply with quote  
 
 Post subject: followup on huge files
PostPosted: Fri Jun 21, 2013 1:50 am 
Hi all, I'm trying to use GREAT in batch with background. It works in the sense I get results, nevertheless I get what seems to be truncated files for some runs: only GO are reported, the info footer is missing and I have ~1100 lines in the results.
Any hint?
Report this post   
Reply with quote  
 
 Post subject:
PostPosted: Thu Jul 18, 2013 10:08 am 
Site Admin
Hi,

It is hard to determine if this is a bug without seeing the actual output file, but do keep in mind that GREAT's export features (http://bejerano.stanford.edu/help/display/GREAT/Export) and the programming interface (http://bejerano.stanford.edu/help/display/GREAT/Programming+Interface) only return the top 500 enrichments.

Let us know if you still think this is a bug.

Thanks,
Bejerano Team
Report this post   
Reply with quote  
 
Post New Topic » Reply »  Page 1 of 1   [ 6 posts ]  



Who is online

Users browsing this forum: Bing [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
cron
Fatal: Not able to open ./cache/data_global.php