BigNeuron

FAQ

Frequently Asked Questions

Q1. I do not want someone else to convert or (re)implement and evaluate my algorithms and claim a certain performance. How will BigNeuron handle this?

A. BigNeuron will NOT change the core parts of any of the entry-algorithms, but will instead use the original implementation of the respective groups. However, BigNeuron has clearly defined requirements that the input image data format and output neuron reconstruction format be the same for all methods. Moreover, every algorithm has to be accompanied by one single best parameter set (fine-tuned by the developers beforehand) so they can be run with large-scale data sets without stopping. If a manual operation component is needed, this must be defined before the batch processing begins.

Q2. To what extent should we enforce implementation of a neuron tracing method in Vaa3D? What should be the programming language of the plugin?

A. The Vaa3D plugin interface provides non-invasive linkage to any third party code. The programming language for creating a plugin is C/C++. The standardized protocols to port algorithms will be illustrated with many examples and abundant documentation with hands-on support during dedicated hackathons.

Q3. How can I incorporate code written in Matlab, Java, Python or another language other than C/C++?

A. BigNeuron is a very large scale project, and enforcing a unified API is critical to ensure fair comparison for any pre-defined assessment. We thus discourage usage of Matlab, Java, Python or other programming languages besides C/C++ for this bench testing. Nevertheless, if an API of a specific language other than C/C++ can be created to utilize the Vaa3D plugin interface for reading images and generating SWC format neuron reconstructions, it will be included in bench-testing.

Q4. What is the “ground truth” or “gold standard” for evaluation?

A. The most practical possibility is to annotate a subset of the data, without the participants knowing which part. This requires running the algorithms on all of the data and still yielding reliable estimates of performance. Efficient annotation could be leveraged by Vaa3D’s Virtual Finger functions. We encourage data contributors to provide as many manually generated or curated reconstructions as possible that will be used to evaluate algorithms. We will also organize an Annotation Workshop for experts to generate such “ground truth” or "gold standard" reconstructions.

It should be noted that in the strict scientific sense, there is no “ground truth” for the majority of these data sets, as the “ground truth” has to be generated using completely different methods to reflect the physical 3D morphology of the neurons. In this sense, the human-generated reconstructions are merely “gold standard” data to assist evaluation. Such "gold standard" evaluation data should be generated by multiple human annotators in order to measure the variability of manual reconstructions in a multi-dimensional morphometric space. Comparison with machine-generated reconstructions would then allow one to estimate the type and bounds of errors for any of the algorithms. The good algorithms will be those that consistently approach the human consensus (mean) reconstructions within manual variation (i.e. standard deviation for any of the metrics in the multi-dimensional space).

Q5. Will it be possible to get better reconstructions by somehow combining the output of individual methods (for example by majority voting)?

A. Yes, that plan is part of the analysis stage after bench-testing.

Q6. When will this project be announced?

A. We plan to make public announcement around late March 2015. Before then, we will continue smaller-scale testing, finalizing the protocols, etc.

Q7. How many algorithms will this project consider?

A. It will be a worldwide community effort including 20 or more neuron tracing methods.

Q8. How many single neuron data sets will this project consider?

A. BigNeuron will bench test from a minimum of 20,000 to upwards of around 100,000 single neuron data sets of different species.

Q9. What is the range of data volumes in this project?

A. Each data set will correspond to one single neuron with volume approximately ranging between 100 mega-voxels to more than 200 giga-voxels.

Q10. What is the range of species in this project?

A. The data sets will be as diverse as possible and so far include insect nervous systems (e.g. fruit fly CNS and PNS, dragonfly thoracic ganglion cells), zebrafish and mouse retinal neurons, human and mouse cortical neurons.

Q11. Will most of the neurons in the testing be insect (e.g. Drosophila) neurons? How to deal with unbalanced size of neuron data sets?

A. In terms of the already available imaging data for a diverse population of neurons, this will likely be true. However, the number of neurons is not the only metric of data set diversity. Tracing difficulty also depends on the neuron’s scale (size), and mammalian neurons typically encompass a much greater (100x or even 1000x) volume than insect neurons. Thus, tracing of “unit data” (see definition below) is also a useful indicator of bench-testing scale. In the pilot, all species available will be cross-tested.

Q12. What is a single “data unit” or “unit data” for input image data?

A. We define a “data unit” as the lesser of 1 giga-voxel (1024x1024x1024 voxels) volume of a 3D image stack and the total volume of such image stack. A 3D image stack of 150 mega-voxels will constitute one data unit. A 3D image stack with 2.5 giga-voxels will correspond to 2.5 data units. In some cases, the latter data set might also be counted as 3 data units (i.e. ceiling of 2.5) when NeuronAssembler is used to split and assemble pieces for reconstruction; in these cases, the number of data units also parallels the total times of major data I/O operations.

Q13. What will be the maximum allowed run time on a supercomputer core?

A. Up to 1 hour will be allowed for a neuron tracing method to run for a “data unit” on one supercomputer core.

Q14. How many people (and who exactly) will run the bench testing on supercomputers?

A. The bench testing jobs will be generated jointly by the algorithm developers and BigNeuron core group with assistance from the staffs of the supercomputer facilities. A small number of participant representatives will be selected among algorithm developers, data contributors, and organizers to submit the jobs transparently for fairest bench-testing.

Q15. What computing resources will BigNeuron require?

A. We are planning a bench-test database of ~100k+ single neuron images (totaling 20TB+ image data) against 20 or more different methods. Some algorithms will run sufficiently quickly and others will be slower, but those too slow will be terminated. All programs will be executed as a cluster job and every neuron should take no more than one CPU hour.

Q16. How much data will BigNeuron generate?

A. We estimate 0.5 to 2.5 million reconstructions to be produced in this project for different species, imaging modalities, brain regions, and reconstruction algorithms before December 2015, corresponding to about 1-2 TB of SWC format files.

Q17. Who will look at and make sense of all the data the project will generate?

A. BigNeuron is a community project, so we will invite the research community to access the reconstruction results and jointly devise strategies for data analysis, including but not limited to comparing against manually generated reconstruction, producing the statistical mean reconstructions to approximate the available “gold standards,” estimating different types of variations and error bounds, etc.

Q18. Is BigNeuron a crowd-sourcing project like EyeWire, etc.? If not, what is the difference?

A. BigNeuron is a community project, but not a crowd-sourcing project. We are not asking people to annotate reconstructions. Instead, we ask developers and trained experts to provide different neuron reconstruction algorithms and available manual tracing and to join in for bench-testing and analysis in a collaborative and professional spirit. In future editions of BigNeuron, there will be a crowd-sourcing and annotation component, to build the reference database as comprehensively as we can.

Q19. Will the standardized data format ensure compatibility with CRCNS.org, the Allen Institute database, and NeuroMorpho.org?

A. The reconstruction format will be SWC format, compatible with all known public resources. Other meta-data formats, such as Vaa3D’s linker file format, are also standardized for easy use.

Q20. Will interactive/manual/semi-automatic tracing methods be considered for bench testing?

A. No.

Q21. Will a learning based tracing method be considered for bench testing?

A. Yes, but the learning portion has to be completed off-line before the method is bench-tested for the entire data set.

Q22. Will 2D image data be considered in this project?

A. No, but we hope the 3D methods resulted from this project will also be very helpful for analyzing 2D data sets. A future edition will also include piloting 4D, real-time analysis programs.

Q23. Will BigNeuron seek or accept industrial sponsorship?

A. Industrial sponsorship is welcome provided such support does not affect open accessibility of algorithms and data. We will be seeking additional funding to support the effort, early in 2015.

Q24. Any IP issue for BigNeuron?

A. This project will be organized in the spirit of open-source methods and open-access data. We do not anticipate any IP issue for this project.

Q25. What is the timeline of BigNeuron?

A. The first phase is planned for one year, including a 3-month large-scale bench-test on supercomputers followed by community-oriented analysis of the results (see detailed timeline for more information).

Q26. Will algorithms be evaluated on the whole data set or in different categories of data set, such as neuron type categories or microscopy categories?

A. The analyses of the data produced in this project will be carried out by the community in late 2015. We do not constrain the possibilities of how the data should be analyzed.

Q27. Should I provide raw tiles or stitched images?

A. Since image stitching is not a major goal of this project, we encourage most data contributors to provide stitched images. For some extreme cases, we might be able to allocate resources to stitch the very large image data sets. This should be discussed case-by-case.

Q28. Do I have to rely on NeuronAssembler? Can I use my own assembler or another strategy like automatically moving one data unit to another to continue tracing?

A. NeuronAssembler is provided as a convenient framework to extend the capability of an individual tracing method to process one small image to a very large image. It is non-invasive and allows a few different options how a large neuron would be assembled. For the sake of smooth bench testing, we encourage interested parties who might also provide alternative assembling approaches to merge their assembling algorithms into the NeuronAssembler as additional options to allow different tracing algorithms to be invoked.

Q29. Does each image contain a single neuron only? Do I need to consider soma detection?

A. Yes, each image contains only one neuron. In most cases, soma locations should be detected by the respective algorithms themselves. For tracing very large image data sets using NeuronAssembler, we may start from the same pre-defined soma locations.

Q30. Will there be a standard API for accessing metadata such as voxel resolutions and microscope specification?

A. The test data will all be converted to a common format for the very large scale testing. Because of the heterogeneity of image data from different sources, we do not plan to provide a standard API for accessing metadata from original data. However, some general descriptions of different data sets will be provided. In such descriptions, we will try to include as much as possible voxel resolution and microscope specification information, if they are available from the data sources.

Q31. When both axons and dendrites are included in the image data set, is the automated reconstruction process supposed to identify which is which? How about detection and measurement of dendritic spines and axonal varicosities?

A. The first phase of BigNeuron will not aim at separating axons from dendrites, or detecting the spines or axonal varicosities. However, such information could be more easily obtained by analyzing the respective images of 3D reconstructed neurons. In addition, for selected data sets where such neuronal features are well presented, we encourage interested researchers to build more sophisticated tools for such purposes.

Q32. Should individual algorithms use their own preprocessing functions, or can they use many image analysis functions or plugins in Vaa3D?

A. This project will handle preprocessing related to file format, scale of the data, but not denoising, anisotropic filtering, etc. While Vaa3D has these functions which are also available for any algorithm developer to use, using such or alternative dependency libraries in a tracing method should be the part of that method, and should be made self-contained.

Q33. Does this project bench test algorithms that rely on certain APIs in commercial library or libraries with certain license limitations?

A. This project will consider algorithms that can be bench-tested. This requires the dependency libraries to be openly available (and not contain any malware). During the first phase of the BigNeuron project, it would be hard to bench-test commercial software packages in a convenient and fair manner, unless the respective software vendor would make such software packages free to test and compatible with our bench-testing platforms.

Q34. What is the potential impact of this project on a single lab that provides the data or algorithm?

A. In short, the BigNeuron project is specifically beneficial to small or single research labs. For data contributors, they can get a free copy of the neuron reconstructions produced by many different algorithms. These data can be used for further analyses or in publications. This is a substantial resource savings for data contributors. For algorithms contributors, BigNeuron will generate a standard platform for bench-testing and comparing various methods, using massive scale data sets. This will clearly define the state-of-the-art of the neuron reconstruction research and development, thus helping the algorithm contributors to do more serious method studies. For data analysts, the BigNeuron project will produce a vast amount of single neuron morphologies data for different species, all using a standard protocol. With this kind of high quality data, data analysts will be able to avoid many hassles in current neuroscience data processing and focus on real scientific discovery from these data. In addition, BigNeuron will facilitate much easier communication among various research groups, as all the protocols will be standardized and publicly shared.

Q35. Some existing algorithms are better than others. Do you assign different weights to different algorithms in finding a final consensus reconstruction?

A. This is one possibility to generate a consensus reconstruction. Since this is a question of data analysis, we also encourage the community to propose other alternatives.

Q36. It might turn out that some algorithms work better for some types of neurons than others. Does the system allow changing algorithms and parameters for adapting to the specific neuron types, brain regions, and species?

A. For each algorithm, we allow a few different parameter-configurations (currently planned to be 4) to be used in bench testing and use only the best results as the final result of such an algorithm for any specific data set.

Q37. Do you plan to hold workshops to teach people how to use the BigNeuron tools?

A. These workshops could be planned in future editions of the project. We will also prepare documentation of the tools developed in this project and release it to the public.

Q38. Will the data sets include tree-like structures other than neurons, such as blood vessels and lung structures?

A. The BigNeuron project will focus primarily on neuron structures. It might also contain data sets such as glial cells, vasculatures, etc. To ensure the success of bench-testing in the first phase, we will give priority to include data sets that have single neurons.

Q39. Can algorithm developers who focus on related problems such as EM-oriented neuron reconstruction, synapse and cell counting, visualization, or blood vessel segmentation, etc., participate in the BigNeuron hackathons?

A. The BigNeuron hackathons focus on porting automated neuron reconstruction algorithms onto a common Vaa3D platform. These hackathons will have clearly defined goals and timelines. At this point we anticipate that a limited amount of resources, especially complimentary hotel rooms, would be available to host algorithm developers who have interest to port other algorithms. We encourage interested people to contact the organizers as soon as possible to check the availability. If there is enough interest, we might also be able to organize additional satellite hackathons for these related problems and techniques.

Q40. I cannot attend BigNeuron hackathons or workshops. How can I port my neuron reconstruction method or contribute my annotations?

A. In this case you are encouraged to contact the BigNeuron project directly at bigneuron@alleninstitute.org to arrange an alternative way to port the algorithm and attend the project.