<% _.each(terms.slice(0, 3), function(term, idx) { %>

<%= term._highlightResult.name.value %>

<% }) %>
Discover the best online tools to assist you in research


noun_15445_cc Created with Sketch.
 by Gaurav Pandey


Datasink is a customizable pipeline for generating diverse ensembles of heterogeneous classifiers, as well as the accompanying metadata needed for ensemble learning approaches utilizing ensemble diversity for improved performance. It also fairly evaluates the performance of several ensemble learning methods including greedy selection, enhanced selection [Caruana2004], and stacked generalization (stacking) [Wolpert1992]. Though other tools exist, we are unaware of a similarly modular, scalable pipeline designed for large-scale ensemble learning. Datasink was developed to support research by Sean Whalen and Gaurav Pandey (see [Whalen2013]) with the support of the Icahn Institute for Genomics and Multiscale Biology at Mount Sinai. Datasink is designed for generating extremely large ensembles (taking days or weeks to generate) and thus consists of an initial data generation phase tuned for multicore and distributed computing environments. The output is a set of compressed CSV files containing the class distribution produced by each classifier that serves as input to a later ensemble learning phase.


Ask a question or post a review

1 collections
Show more

Similar/Related tools