P

postdoc_drr

Pre-interview exercise

Before your interview for the Postdoctoral Fellow position we would like you to contemplate a hypothetical data analysis task to show us some of your skills. Please write a short report (less than a page) how you would do the following if you were to be assigned the hypothetical analysis below. In other words, it is not necessary to download the data and tools if time does not permit, _we just want to understand how you would approach this problem. The short report is a hypothetical exercise focusing conceptually on how you would perform this analysis.

-How do you understand the tasks?

-What are the steps?

-What what are the possible alternative choices/tools that you might have available?

-If you have time, you can use an R object here and make some plots with a brief exploratory analysis. Example code/tutorial can be found here

We respect your time and please do not spend more than an hour on this pre-interview exercise. If something is simply too detailed to worry about in a one hour analysis, please skip it.

Please send the report to wcrismani@svi.edu.au no later than 4pm the day before your interview.

Hypothetical analysis

(**you do not need to download the raw data and tools if time does not permit. Please just describe how you could attempt it as specified above)

Introduction

Before your interview for the Postdoctoral Fellow position we would like you to complete a small data analysis task to show us some of your skills.

The task is to conduct a study of meiotic crossover events using bulk whole genome sequencing data from 5 mice (described further below).

We would like you to present a short report (either in a document or notebook) that you will submit ahead of the interview. We may discuss your report in the formal interview or during the day, depending on time.

More specifically, we would like to see:

  • A short consideration of quality control for the data;
  • Calling crossovers (haplotype transitions) using sgcocaller or software that you might consider appropriate;
  • One or more plots about crossover frequencies, positioning and any other features that you consider interesting;
  • A brief discussion of any conclusions you reach (or cannot reach) from your analysis.

Be prepared to briefly discuss the choices you make in your analysis. There are many ways to conduct appropriate analyses, so we are not looking for any specific approaches—whatever works to complete the tasks is completely fine.

The mouse genomic data

The mice are from what is referred to as a "BC1F1" generation: More specifically previously two inbred mice were crossed from the strains C57BL6 and FVB. The resulting F1 mice were then backcrossed to an FVB inbred, which resulted in the "BC1F1" mice. It is these BC1F1 mice that were sequenced.

The fastq files, a list of variants, and metadata can be downloaded here.

Fastq files for five mice are provided. Whole genome coverage may be in the range from 1-5x. If this level of coverage is expected to be problematic, please let me know.

Guidance for the analysis and resources

It is suggested to perform the detection of crossovers using sgcocaller, unless you are already familiar with another appropriate tool. Documentation and tutorials on the use of sgcocaller can be found at the following two links:

https://gitlab.svi.edu.au/biocellgen-public/sgcocaller

https://biocellgen-public.svi.edu.au/hinch-single-sperm-DNA-seq-processing/Crossover-identification-with-sscocaller-and-comapr.html

Once you have output from sgcocaller, you may can perform some exploratory analysis using comapr or any other software that that you find appropriate.

Questions to consider:

  • What is the expected crossover frequency?
  • How confident are you in the output of sgcocaller?
  • A common fault of crossover analyses are 'dubious' close double crossovers that are only supported by a limited number of variants, that are perhaps all in the same read. In other words, reads that are mapped to regions that are not the true genomic location of the read's origin. Are there any dubious close double crossovers in these sgcocaller output?
  • Do you have enough samples to draw any biological conclusions.
  • How would you approach the crossover detection if in contrast to this experiment which used genetically identical F1s being backcrossed to make BC1F1s, an experiment used multiple genetically unique F4 mice that were interbred to produce F5 pups?

This task is not meant to be unpleasant or overly onerous. We only expect you to spend 1-2 hours working on it. If you find yourself stuck, feel free to reach out to Wayne for some pointers.

Please send the report to wcrismani@svi.edu.au no later than 4pm the day before your interview.