SciTePress - Publication Details
loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Richárd Rádli ; Zsolt Vörösházi and László Czúni

Affiliation: University of Pannonia, 8200 Veszprém, Egyetem u. 10., Hungary

Keyword(s): Metrics Learning, Pill Recognition, Multi-Modal Learning, Multihead Attention, Multi-Stream Network, Dynamic Margin Triplet Loss.

Abstract: Pill recognition is a key task in healthcare and has a wide range of applications. In this study, we are addressing the challenge to improve the accuracy of pill recognition in a metrics learning framework. A multi-stream visual feature extraction and processing architecture, with multi-head attention layers, is used to estimate the similarity of pills. We are introducing an essential enhancement to the triplet loss function to leverage word embeddings for the injection of textual pill similarity into the visual model. This improvement refines the visual embedding on a finer scale than conventional triplet loss models resulting in higher accuracy of the visual model. Experiments and evaluations are made on a new pill dataset, freely available.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 8.209.245.224

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Rádli, R., Vörösházi, Z. and Czúni, L. (2024). Word and Image Embeddings in Pill Recognition. In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP; ISBN 978-989-758-679-8; ISSN 2184-4321, SciTePress, pages 729-736. DOI: 10.5220/0012460800003660

@conference{visapp24,
author={Richárd Rádli and Zsolt Vörösházi and László Czúni},
title={Word and Image Embeddings in Pill Recognition},
booktitle={Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP},
year={2024},
pages={729-736},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012460800003660},
isbn={978-989-758-679-8},
issn={2184-4321},
}

TY - CONF

JO - Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP
TI - Word and Image Embeddings in Pill Recognition
SN - 978-989-758-679-8
IS - 2184-4321
AU - Rádli, R.
AU - Vörösházi, Z.
AU - Czúni, L.
PY - 2024
SP - 729
EP - 736
DO - 10.5220/0012460800003660
PB - SciTePress