Abstract
We describe a system for meta-analysis where a wiki stores numerical data in a simple comma-separated values format and a web service performs the numerical statistical computation. We initially apply the system on multiple meta-analyses of structural neuroimaging data results. The described system allows for mass meta-analysis, e.g., meta-analysis across multiple brain regions and multiple mental disorders providing an overview of important relationships and their uncertainties in a collaborative environment.
Finn Årup Nielsen—Thanks to the Lundbeck Foundation for the funding of the Center for Integrated Molecular Brain Imaging (CIMBI).
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Major Depressive Disorder
- Topic Ontology
- Lateral Ventricle Volume
- Pituitary Gland Volume
- Semantic MediaWiki
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
The scientific process aggregates a large number of scientific results into a common scientific consensus. Meta-analysis performs the aggregation by statistical analysis of numerical values presented across scientific papers. Collaborative systems such as wikis may easily aggregate text and values from multiple sources. However, so far they have had limited ability to apply numerical analysis as required, e.g., by meta-analysis.
Researchers have discussed the advantages and disadvantage of the tools for conducting systematic reviews from “paper and pencil”, over spreadsheets to RevMan and web-based specialized applications [1]: Setup cost, versatility, ability to manage data, etc. In 2009 they concluded that “no single data-extraction method is best for all systematic reviews in all circumstances”. For example, RevMan and Archie of the Cochrane Library provide an elaborate system for keeping track and analyzing textual and numerical data in meta-analyses, but the system could not import information from electronic databases [1]. Our original meta-analyses [2, 3] relied on the Microsoft Excel spreadsheets later distributed on public web sites. Compared to an ordinary spreadsheet a wiki solution provides data entry provenance and collaborative data entry with immediate update. Shareable folders on cloud-based storage systems would help collaboration on spreadsheets. Online services, such as the spreadsheet of Google Docs, may lack meta-analytic plotting facility. Web-based specialized applications for systematic reviews may have a high setup cost [1].
We have previously explored a simple online meta-analysis system—a “fielded wiki”—in connection with personality genetics [4]. As implemented specifically for this scientific area the web service lacks generality for other types of meta-analytic data. Furthermore the system relied on PubMed or Brede Wiki to represent bibliographic information.
Following Ward Cunningham’s quote “What’s the simplest thing that could possibly work?” we present a simple system that allows for mass meta-analysis of numerical data presented as comma-separated values (CSV) in a standard MediaWiki-based wiki, — the Brede Wiki: http://neuro.compute.dtu.dk/wiki/. We present the data in an Open science data fashion where other researchers can easily reuse the data.
2 Data and Data Representation
We use the MediaWiki-based Brede Wiki to represent the data [5]. For our neuroimaging data each data record usually consists of three values (number of subjects, the mean and standard deviation across subjects of the volume of brain structures). The individual study typically compares two such data records, e.g., from a patient and a control group. We also record labels for the data record, e.g., the biographic information, as well as extra subject information about the two groups, such as age, gender and clinical characteristics, so we get seven or more as the total number of data items for each study. Each meta-analysis will usually determine what extra relevant information we may include, and it may differ between studies, e.g., a Y-BOCS value has typically only relevance for obsessive-compulsive disorder (OCD) patients. The functional neuroimaging area has CogPO and Cognitive Atlas ontologies [6, 7] enabling researchers to describe the topic of an experiment, but these efforts do not directly apply to our data. One CSV line carries the information for each study.
Separate wiki pages store—rather than uploaded files—the CSV data, so the MediaWiki template functionality can transclude the CSV data on other wiki pages. By convention pages with CSV information have the “.csv” extension as part of the title so external scripts can recognize them as special pages and the wiki pages have no wiki markup. Another approach instead of extensions could have used a dedicated MediaWiki namespace for the CSV pages, such that these pages would have the prefix CSV: or similar.
A few MediaWiki extensions can format CSV information: SimpleTable Footnote 1 and TableData. Figure 1 shows the transclusion of CSV data with a modified version of the SimpleTable extension. A MediaWiki template handles the transclusion and also establish web links to download and edit the CSV data.
The Brede Wiki uses the standard template system for recording structured bibliographic data about the publication, and a script that reads through the wiki dumps can represent the data in relational databases [5].
To annotate the CSV information we also use templates. Figure 2 shows the markup of three different CSV pages with information about major depressive, bipolar and obsessive-compulsive disorders, where the title field indicates the title of the CSV page in the wiki and the topic fields link to items in the Brede Wiki topic ontology. Presently, no controlled vocabulary beyond the template fields describes the columns in the CSV. To generate an appropriate content-type (text/csv) a bridging web script functions as a proxy, so a download of the CSV page can spawn a client-side spreadsheet program.
The bulk of the data currently presented in the wiki comes from the large mass meta-analysis of volumetric studies on major depressive disorder reporting over 50 separate meta-analyses for individual brain regions [2]. Further data comes from mass meta-analyses across multiple brain regions on bipolar disorder [3] and first-episode schizophrenia [8], a meta-analysis on longitudinal development in schizophrenia [9] as well as data from individual original studies on obsessive-compulsive disorder.
Apart from neuroimaging studies the Brede Wiki also records data from meta-analyses from a few other studies outside neuroimaging [10, 11], allowing us to test the generality of the framework. We distribute the data under Open Data Commons Open Database License (ODbL).
3 Web Script and Meta-Analysis
The web script for meta-analysis reads the CSV information by downloading it from the wiki, identifies the required columns for meta-analysis, performs the statistical computations and makes meta-analytic plots— the so-called forest and funnel plots—in the SVG format, see Fig. 3. From either the title information or a PubMed identifier the script generates back-links from the generated page to pages on the wiki. The script may also export the computed results as JSON or CSV. Furthermore, it may generate a small R script that sets up the data in variables and use the meta R library for meta-analysis. Finally the script itself can return its source code enabling other researchers to readily inspect the code and more easily reproduce the results [12].
The web script attempts to guess the separator used on the CSV page and also tries to match the elements of the column header, e.g., the strings “control n”, “controls number”, “number of controls”, etc. match for number of control subjects. With no matches the user needs to explicitly specify the relevant columns via URL parameters, which in turn a wiki template can setup. Figure 2 displays three templates where the middle template explicitly specify which columns the web script should interpret as “number of experimentals”, “mean of experimentals” and “standard deviation of experiments”.
Standard meta-analysis computes an effect size from each result in a paper and computes a combined meta-analytic effect size and its confidence interval. Although the methodological development continues, there exist established statistical analysis approaches for ordinary meta-analysis [10]. Our system implements computations on the standardized mean difference for continuous variables and on the natural logarithm of the odds ratio for categorical variables with fixed and random effects methods using an inverse-weighted variance model. As an extra option we provide meta-analysis on the natural logarithm of the variance ratio [13], for comparison of the standard deviations between two groups of subjects. Meta-analysts seldom use this type of summary statistics compared to the standard effect size, but in a few areas such statistics generate interest, e.g., in the controversy over whether males show larger variance in intelligence than females [11].
With continuous data where we have \(\bar{x}_e\), \(s_e\) and \(n_e\) as the mean, standard deviation and number of subjects for the experimental group (e.g., patients) and \(\bar{x}_c\), \(s_c\) and \(n_c\) as the corresponding numbers for the control group (e.g., healthy controls) we compute an effect size \(d\) for each study included in the meta-analysis by
With binary data we compute the logarithm of the odds ratio as
where \(n_{ee}\) and \(n_{ce}\) indicate the number of experimental events and control events (e.g., success of treatment in each of the groups), while the \(c(\cdot )\) function handles the case where no events appear for either the experimental or control group, resulting in a division by zero. For the variance ratio we compute
From the effect sizes and their variances we compute \(Z\)-scores, two-sided \(P\)-values and confidence intervals. Our random effects model gets estimated following the DerSimonian-Laird approach and we furthermore compute Cochran’s homogeneity statistics that gives us a value for how much between-study variance affects each meta-analysis. The web script may return all these numerical results in the JSON format.
Certain types of neuroimaging studies report brain coordinates that may require specialized flexible multivariate meta-analysis [14, 15], for a recent review see [16]. Our system does not yet implement these techniques.
4 Results
We have added 127 pages transcluded with CSV data, — most of which contain data suitable for meta-analysis. For mass meta-analysis we can presently consider 59 meta-analytic datasets. For a single meta-analyses the reading, computation and download finish within seconds. With multiple calls to the web script and JSON output another script can plot multiple meta-analytic results together as in Fig. 5. Generating such a plot takes several minutes: First we use the MediaWiki API of Brede Wiki to get a list of pages which use the meta-analysis template (as shown in Fig. 2). We then download all these pages in the form of raw wiki markup and extract the templates with regular expressions. With this extracted information we call the meta-analytic web script multiple time, letting it return the results in the machine readable JSON format, to get the meta-analytic results that we can plot and tabularize. For generating the page shown in Fig. 3 with results from an individual meta-analysis we need only the CSV data and the web script, while the script that generated Fig. 5 with mass meta-analysis used information defined in templates, CSV data and the web script with no further adaption of MediaWiki.
The specific example shown in Fig. 3 displays results from 5 studies reported in 3 different papers so far entered in the Brede Wiki on the differences between obsessive-compulsive disorder patients and healthy controls in pituitary gland volume [17–19]. In this case the forest plot exposes a large variation between the studies. The colored components in the table and the forest plot link back to pages on the wiki associated with the three papers.
Figure 4 shows the links between the pages in the wiki and to the web script. The middle table gets generated from template definitions and data similar to the data displayed in Fig. 2. It provides links to topic ontology pages (with the page for “Major depressive disorder” shown), the data and result page generated by the meta-analysis web script.
Table 1 lists a part of the mass meta-analytic results sorted according to \(P\)-value of the effect size. Disregarding the non-neuroimaging results we see that results from meta-analyses on hippocampal and lateral ventricles volumes in first-episode schizophrenia and hippocampal volumes in major depressive disorder reach the highest significance. The original published meta-analyses [2, 8] also mentioned these highly significant results prominently.
Whereas an offline script generated Fig. 5 an online web script has generated Fig. 6 with a mass meta-analysis of obsessive-compulsive disorder neuroimaging studies. This script can read a single wiki page with templates that describes meta-analytic CSV data and direct the meta-analytic web script to analyze all that data and return the results in a machine readable format. The mass meta-analytic web script then formats the results in a sortable table and plot it in the L’Abbé-like effect-uncertainty plot, here presently using the Bubble chart from Google Chart Tools.Footnote 2 It takes around 10 s to download and compute the 25 meta-analyses represented on the page for obsessive-compulsive disorder neuroimaging studies. In this case the meta-analysis for the pituitary gland has the largest magnitude of the effect size and thus placed to the extreme right in the plot. As redness indicates heterogeneity among the included studies in the meta-analysis the plot also shows that this particular region has the highest heterogeneity compared to the other 24 meta-analyses. The forest plot in Fig. 3 also display this heterogeneity.
5 Discussion
By using MediaWiki in our present system we exploit the template facility to capture structured information and use free-form wikitext for annotation and comment on the individual scientific papers, — as in semantic academic annotation wikis AcaWiki Footnote 3 and WikiPapers.Footnote 4 We can also use the pages of the wiki as a simple means to keep track of the status of the papers considered for the meta-analysis: potentially eligible, eligible, partially entered and fully entered.
The Internet Brain Volume Database (IBVD)Footnote 5 also records structural neuroimaging data and presents the data online with visualization [20]. It does not entirely meet our needs as the database does not match associated patient and control groups. Nevertheless, the data recorded in IBVD and our system have many overlapping fields, and on pages for papers in the Brede Wiki we enter the IBVD paper identifier, so we can make deep links at the bibliographic level.
Why not Semantic MediaWiki? Semantic MediaWiki (SMW) may query text and numerical data, though has not had the ability to make complex computations. The Semantic Result Formats Footnote 6 extension includes average, sum, product and count result formats enabling simple computations of a series of numerical values, but insufficiently for the kind of computations we require. The data for meta-analysis form an n-ary data record (mean, standard deviation, number of subjects, labels) so either individual SMW pages should store each data record or we should invoke the n-ary functionality in Semantic Internal Objects Footnote 7 SMW extension, SMW record or the recently-introduced subobject SMW functionality. We have not investigated whether these tools provide convenient means for representing our data. The Brede Wiki can export its ontologies defined in MediaWiki template to SKOS. Our future research can consider RDFication of the CSV information through the SCOVO format [21].
Some methods for systematic reviews make two independent data-extractions that require consolidation and resolution of discrepancies to counter data extraction errors, that sometimes appear in meta-analyses [22]. Our simple system does not directly support such a process.
We wrote the web service in Python, where NumPy makes vector computation available and SciPy provides statistical methods necessary for the computation. In a future PHP implementation the script could more closely integrate with the wiki as either a MediaWiki or a SMW extension. Furthermore, new developments in 2012 in MediaWiki with support for the Lua programming language and Wikidata [23] for structured data in MediaWiki have relevance for our system. With the flexibility of Lua a wiki user can write templates that format CSV information into MediaWiki tables and perform numerical computations on the data, — possibly to such an extent that it will make the table rendering MediaWiki extension and the meta-analytic web script unnecessary.
6 Conclusion
A wiki built from standard components provides a inexpensive solution with means to manage meta-analytic data in a collaborative environment. The general framework allows not only the meta-analysis of neuroimaging-derived data but has the potential for managing and analyzing data from many other domains. The system gives an overview of the important effects and uncertainties across numerous meta-analyses that multiple wiki users may extend collaboratively.
We regard the environment as an Open Science system that makes methods and data immediately available in human and machine readable form.
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
References
Elamin, M.B., Flynn, D.N., Bassler, D., Briel, M., Alonso-Coello, P., Karanicolas, P.J., Guyatt, G.H., Malaga, G., Furukawa, T.A., Kunz, R., Schunemann, H., Murad, M.H., Barbui, C., Cipriani, A., Montori, V.M.: Choice of data extraction tools for systematic reviews depends on resources and review complexity. J. Clin. Epidemiol. 62(5), 506–510 (2009)
Kempton, M.J., Salvador, Z., Munafo, M.R., Geddes, J.R., Simmons, A., Frangou, S., Williams, S.C.R.: Structural neuroimaging studies in major depressive disorder: meta-analysis and comparison with bipolar disorder. Arch. Gen. Psychiatry 68(7), 675–690 (2011)
Kempton, M.J., Geddes, J.R., Ettinger, U., Williams, S.C.R., Grasby, P.M.: Meta-analysis, database, and meta-regression of 98 structural imaging studies in bipolar disorder. Arch. Gen. Psychiatry 65(9), 1017–1032 (2008)
Nielsen, F.Å.: A fielded wiki for personality genetics. In: Proceedings of the 6th International Symposium on Wikis and Open Collaboration, New York. ACM (2010)
Nielsen, F.Å.: Brede Wiki: Neuroscience data structured in a wiki. In: Lange, C., Schaffert, S., Skaf-Molli, H., Völkel, M. (eds.) Proceedings of the Fourth Workshop on Semantic Wikis - The Semantic Wiki Web. Volume 464 of CEUR Workshop Proceedings., Aachen, Germany, RWTH Aachen University, pp. 129–133, June 2009
Turner, J.A., Laird, A.R.: The Cognitive Paradigm Ontology: design and application. Neuroinformatics 10(1), 57–66 (2011)
Poldrack, R.A., Kittur, A., Kalar, D., Miller, E., Seppa, C., Gil, Y., Parker, D.S., Sabb, F.W., Bilder, R.M.: The cognitive atlas: toward a knowledge foundation for cognitive neuroscience. Front. Neuroinformatics 5, 17 (2011)
Steen, R.G., Mull, C., McClure, R., Hamer, R.M., Lieberman, J.A.: Brain volume in first-episode schizophrenia: systematic review and meta-analysis of magnetic resonance imaging studies. Br. J. Psychiatry 188, 510–518 (2006)
Kempton, M.J., Stahl, D., Williams, S.C.R., DeLisi, L.E.: Progressive lateral ventricular enlargement in schizophrenia: a meta-analysis of longitudinal MRI studies. Schizophr. Res. 120(1–3), 54–62 (2010)
Hartung, J., Knapp, G., Sinha, B.K.: Statistical Meta-Analysis with Applications. Wiley Series in Probability and Statistics. Wiley, Hoboken (2008)
Irwing, P., Lynn, R.: Sex differences in means and variability on the progressive matrices in university students: A meta-analysis. Br. J. Psychol. 96(4), 505–524 (2005)
Ince, D.C., Hatton, L., Graham-Cumming, J.: The case for open computer programs. Nature 482(7386), 485–488 (2012)
Shaffer, J.P.: Caution on the use of variance ratios: a comment. Rev. Educ. Res. 62(4), 429–432 (1992)
Nielsen, F.Å., Hansen, L.K.: Modeling of activation data in the BrainMap™ database: Detection of outliers. Hum. Brain Mapp. 15(3), 146–156 (2002)
Turkeltaub, P., Eden, G.F., Jones, K.M., Zeffiro, T.A.: A novel meta-analysis technique applied to single word reading. NeuroImage 13(6 (part 2)), S272 (2001)
Radua, J., Mataix-Cols, D.: Meta-analytic methods for neuroimaging data explained. Biol. Mood Anxiety Disord. 2, 6 (2012)
MacMaster, F.P., Russell, A., Mirza, Y., Keshavan, M.S., Banerjee, S.P., Bhandari, R., Boyd, C., Lynch, M., Rose, M., Ivey, J., Moore, G.J., Rosenberg, D.R.: Pituitary volume in pediatric obsessive-compulsive disorder. Biol. Psychiatry 59(3), 252–257 (2006)
Atmaca, M., Yildirim, H., Ozler, S., Koc, M., Kara, B., Sec, S.: Smaller pituitary volume in adult patients with obsessive-compulsive disorder. Psychiatry Clin. Neurosci. 63(4), 516–520 (2009)
Jung, M.H., Huh, M.J., Kang, D.H., Choi, J.S., Jung, W.H., Jang, J.H., Park, J.Y., Han, J.Y., Choi, C.H., Kwon, J.S.: Volumetric differences in the pituitary between drug-naïve and medicated male patients with obsessive-compulsive disorder. Prog. Neuro-psychopharmacol. Biol. Psychiatry 33(4), 605–609 (2009)
Kennedy, D.N., Hodge, S.M., Gao, Y., Frazier, J.A., Haselgrove, C.: The Internet Brain Volume Database: a public resource for storage and retrieval of volumetric data. Neuroinformatics 10(2), 129–140 (2011)
Hausenblas, M., Halb, W., Raimond, Y., Feigenbaum, L., Ayers, D.: SCOVO: using statistics on the web of data. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T., Hyvönen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.) ESWC 2009. LNCS, vol. 5554, pp. 708–722. Springer, Heidelberg (2009)
Gøtzsche, P.C., Hróbjartsson, A., Maric, K., Tendal, B.: Data extraction errors in meta-analyses that use standardized mean differences. JAMA 298(4), 430–437 (2007)
Vrandečić, D.: Wikidata: a new platform for collaborative data collection. In: Proceedings of the 21st international conference companion on World Wide Web, New York, NY, USA, Association for Computing Machinery, pp. 1063–1064 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nielsen, F.Å., Kempton, M.J., Williams, S.C.R. (2015). Online Open Neuroimaging Mass Meta-Analysis with a Wiki. In: Simperl, E., et al. The Semantic Web: ESWC 2012 Satellite Events. ESWC 2012. Lecture Notes in Computer Science(), vol 7540. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46641-4_19
Download citation
DOI: https://doi.org/10.1007/978-3-662-46641-4_19
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46640-7
Online ISBN: 978-3-662-46641-4
eBook Packages: Computer ScienceComputer Science (R0)