Haute Ecole de Gestion de Genève

Using crowdsourcing for multi-label biomedical compound figure annotation

Description: 

Abstract. Information analysis or retrieval for images in the biomedical literature needs to deal with a large amount of compound figures (figures containing several subfigures), as they constitute probably more than half of all images in repositories such as PubMed Central, which was the data set used for the task. The ImageCLEFmed benchmark proposed among other tasks in 2015 and 2016 a multi–label classification task, which aims at evaluating the automatic classification of figures into 30 image types. This task was based on compound figures and thus the figures were distributed to participants as compound figures but also in a sep-arated form. Therefore, the generation of a gold standard was required, so that algorithms of participants can be evaluated and compared. This work presents the process carried out to generate the multi–labels of ∼ 2650 compound figures using a crowdsourcing approach. Automatic algorithms to separate compound figures into subfigures were used and the results were then validated or corrected via crowdsourcing. The im-age types (MR, CT, X–ray, ...) were also annotated by crowdsourcing including detailed quality control. Quality control is necessary to insure quality of the annotated data as much as possible. ∼ 625 hours were invested with a cost of ∼ 870$.

Semantic social media analysis of Chinese tourists in Switzerland

Description: 

In recent years, Sina Weibo, a Twitter-like social network service in China, has attracted attention from scholars in the domain of information systems, as the spread and influence of users’ opinions are increasingly important, particularly in the tourism industry. This study examined the behaviors of Chinese tourists in Switzerland by adopting a semantic-based linked data methodology. A total of 103,778 Weibo messages shared with Swiss locations were collected between January 2013 and April 2015. We addressed questions about Chinese travelers’ profiles, trends in keywords, and differences between first time and repeat visitors. Moreover, we implemented a semantic search engine by employing linked data technologies to provide useful information about Chinese tourists in Switzerland, both for the tourism industry and individual tourists.

Using the cloud as a platform for evaluation and data preparation

Description: 

This chapter gives a brief overview of the VISCERAL Registration System that is used for all the VISCERAL Benchmarks and is released as open source on GitHub. The system can be accessed by both participants and administrators, reducing the direct participant–organizer interaction and handling the documentation available for each of the benchmarks organized by VISCERAL. Also, the upload of the VISCERAL usage and participation agreements is integrated, as well as the attribution of virtual machines that allow participation in the VISCERAL Benchmarks. In the second part, a summary of the various steps in the continuous evaluation chain mainly consisting of the submission, algorithm execution and storage as well as the evaluation of results is given. The final part consists of the cloud infrastructure detail, describing the process of defining requirements, selecting a cloud solution provider, setting up the infrastructure and running the benchmarks. This chapter concludes with a short experience report outlining the encountered challenges and lessons learned.

VISCERAL: evaluation-as-a-service for medical imaging

Description: 

Systematic evaluation has had a strong impact on many data analysis domains, for example, TREC and CLEF in information retrieval, ImageCLEF in image retrieval, and many challenges in conferences such as MICCAI for medical imaging and ICPR for pattern recognition. With Kaggle, a platform for machine learning challenges has also had a significant success in crowdsourcing solutions. This shows the importance to systematically evaluate algorithms and that the impact is far larger than simply evaluating a single system. Many of these challenges also showed the limits of the commonly used paradigm to prepare a data collection and tasks, distribute these and then evaluate the participants’ submissions. Extremely large datasets are cumbersome to download, while shipping hard disks containing the data becomes impractical. Confidential data can often not be shared, for example medical data, and also data from company repositories. Real-time data will never be available via static data collections as the data change over time and data preparation often takes much time. The Evaluation-as-a-Service (EaaS) paradigm tries to find solutions for many of these problems and has been applied in the VISCERAL project. In EaaS, the data are not moved but remain on a central infrastructure. In the case of VISCERAL, all data were made available in a cloud environment. Participants were provided with virtual machines on which to install their algorithms. Only a small part of the data, the training data, was visible to participants. The major part of the data, the test data, was only accessible to the organizers who ran the algorithms in the participants’ virtual machines on the test data to obtain impartial performance measures.

Text- and content-based medical image retrievals in the VISCERAL retrieval benchmark

Description: 

Text- and content-based retrieval are the most widely used approaches for medical image retrieval. They capture the similarity between the images from different perspectives: text-based methods rely on manual textual annotations or captions associated with images; content-based approaches are based on the visual content of the images themselves such as colours and textures. Text-based retrieval can better meet the high-level expectations of humans but is limited by the time-consuming annotations. Content-based retrieval can automatically extract the visual features for high-throughput processing; however, its performance is less favourable than the text-based approaches due to the gap between low-level visual features and high-level human expectations. In this chapter, we present the participation from our joint research team of USYD/HES-SO in the VISCERAL retrieval task. Five different methods are introduced, of which two are based on the anatomy–pathology terms, two are based on the visual image content and the last one is based on the fusion of the aforementioned methods. The comparison results, given the different methods indicated that the text-based methods outperformed the content-based retrieval and the fusion of text and visual contents, generated the best performance overall.

Retrieval of medical cases for diagnostic decisions: : VISCERAL retrieval benchmark

Description: 

Health providers currently construct their differential diagnosis for a given medical case most often based on textbook knowledge and clinical experience. Data mining of the large amount of medical records generated daily in hospitals is only very rarely done, limiting the reusability of these cases. As part of the VISCERAL project, the Retrieval benchmark was organized to evaluate available approaches for medical case-based retrieval. Participant algorithms were required to find and rank relevant medical cases from a large multimodal dataset (including semantic RadLex terms extracted from text and visual 3D data) for common query topics. The relevance assessment of the cases was done by medical experts who selected cases that are useful for a differential diagnosis for the given query case. The approaches that integrated information from both the RadLex terms and the 3D volumes (mixed techniques) obtained the best results based on five standard evaluation metrics. The benchmark set up, dataset description and result analysis are presented.

Combining radiology images and clinical metadata for multimodal medical case-based retrieval

Description: 

As part of their daily workload, clinicians examine patient cases in the process of formulating a diagnosis. These large multimodal patient datasets stored in hospitals could help in retrieving relevant information for a differential diagnosis, but these are currently not fully exploited. The VISCERAL Retrieval Benchmark organized a medical case-based retrieval algorithm evaluation using multimodal (text and visual) data from radiology reports. The common dataset contained patient CT (Computed Tomography) or MRI (Magnetic Resonance Imaging) scans and RadLex term anatomy–pathology lists from the radiology reports. A content-based retrieval method for medical cases that uses both textual and visual features is presented. It defines a weighting scheme that combines the anatomical and clinical correlations of the RadLex terms with local texture features obtained from the region of interest in the query cases. The visual features are computed using a 3D Riesz wavelet texture analysis performed on a common spatial domain to compare the images in the analogous anatomical regions of interest in the dataset images. The proposed method obtained the best mean average precision in 6 out of 10 topics and the highest number of relevant cases retrieved in the benchmark. Obtaining robust results for various pathologies, it could further be developed to perform medical case-based retrieval on large multimodal clinical datasets.

Social Media Nutzung von Schweizer DMOs 2016

Productivity convergence across US states in the public sector: an empirical study

Description: 

Abstract This paper will examine the productivity of the public sectors in the US across the states. Because there is heterogeneity across states in terms of public services provided that could impact its productivity. In fact, there could be a convergence among the states. The services provided by the public sectors have come under increased scrutiny with the ongoing process of reform in recent years. The public sector unlike the private sector or in the absence of contestable markets, and the information and incentives provided by these markets, performance information, particularly measures of comparative performance, have been used to gauge the productivity of the public service sector. This paper will examine the productivity of the public sector across states throughout the United States. The research methodology marries exploratory (i.e. Kohonen clustering) and empirical techniques (panel model) via the Cobb-Douglas production function. Given that there is a homogeneity across states in terms of the use of a standard currency, it will be easy to identify the nature of the convergence process in the public sectors by states throughout the United States.

Learning with feature side-information

Description: 

Very often features come with their own vectorial descriptions which provide detailed information about their properties. We refer to these vectorial descriptions as feature side-information. The feature side-information is most often ignored or used for feature selction prior to model fitting. In this paper, we propose a framework that allows for the incorporation of feature side-information during the learning of very general model families. We control the structures of the learned models so that they reflect features’ similarities as these are defined on the basis of the side-information. We perform experiments on a number of benchmark datasets which show significant predictive performance gains, over a number of baselines, as a result of the exploitation of the side-information.

Seiten

Le portail de l'information économique suisse

© 2016 Infonet Economy

RSS - Haute Ecole de Gestion de Genève abonnieren