Preface

It is our great pleasure to welcome you to the proceedings of KÉPAF 2025, held in Hévíz, 28-31 January 2025. This year, the conference marks its 15th edition, continuing its tradition as a premier venue in Hungary for sharing the latest research, innovative solutions, and new perspectives in the fields of image processing, computer vision, and pattern recognition.

This year's program showcases 41 regular articles and 5 demos, selected by our esteemed committee members. These papers represent the participants' latest advances across a broad spectrum of topics, including (but not limited to) object recognition, 3D recognition, medical imaging, calibration techniques, AI models, and real-time computer systems. We extend our sincere gratitude to all the authors who submitted their work and contributed to the academic richness of this conference.

The Kuba Attila Prize and the Csetverikov Dmitrij Prize will each be awarded separately to young scientists authoring the best papers selected by the dedicated committee. The Kuba Attila Prize recognizes internationally outstanding multidisciplinary work that plays a decisive role in solving fundamental problems across co-disciplines, while the Csetverikov Dmitrij Prize honors new and generally applicable theoretical methods, algorithms, or models. Additionally, the best PhD dissertation of the last two years will also be awarded.

We are delighted to host invited speakers, whose contributions to the field have shaped its present and hopefully can inspire our future:

Matej Kristian (University of Ljubljana): HIDRA - a deep model for accurate storm surge flood prediction
Zoltán Juhász (HUN-REN EK, MMA): Zenei ősnyelvek nyomainak kutatása mai népzenei kultúrákban öntanuló algoritmusokkal
Giorgos Tolias (CTU Prague): Visual representation and similarity for instance-level retrieval

We also take this opportunity to celebrate the achievements of past conferences and the outstanding contributions that have been recognized over the years.

Kuba Attila Prize Awards (year, name, organization, article):

2023. Tekla Tóth, ELTE, Tekla Tóth and Levente Hajder: A Minimal Solution for Image-Based Sphere Estimation
2021. Hichem Abdellali, University of Szeged, Hichem Abdellali, Fröhlich Róbert és Kató Zoltán: Robust Absolute and Relative Pose Estimation of a Central Camera System from 2D-3D Line Correspondences
2019. Iván Eichhardt, Institute for Computer Science and Control (SZTAKI), Eichhardt, I.; Csetverikov, D.: Affine Correspondences Between Central Cameras for Rapid Relative Pose Estimation
2017. Dániel Baráth, Institute for Computer Science and Control (SZTAKI), Barath, D. and Matas, J. és Hajder, L.: Multi-H: Efficient Recovery of Tangent Planes in Stereo Images
2015. Attila Börcs, Institute for Computer Science and Control (SZTAKI), Börcs Attila, Nagy Balázs, Benedek Csaba: Valós idejű járműdetekció LIDAR pontfelhősorozatokon
2013. László Gábor Varga, University of Szeged, Varga László, Balázs Péter, Nagy Antal: Gradiens módszerek automatikus súlyozásán alapuló diszkrét képrekonstrukciós eljárás
2011. Csaba Domokos, University of Szeged, Domokos Csaba, Kató Zoltán: Affin Puzzle: Deformált objektumdarabok helyreállítása megfeleltetések nélkül
2009. Tamás Blaskovics, University of Szeged, Blaskovics Tamás, Kató Zoltán, Ian Jermyn: Kör alakú objektumok szegmentálása Markov mező segítségével
2007. Sándor Fazekas, Computer and Automation Research Institute of the Hungarian Academy, Sándor Fazekas, Dmitry Chetverikov: A non-regular optical flow for dynamic textures

PhD Prize Awards (year, name, doctoral school, organization, supervisor(s), thesis title, year of defend):

2019-20, Dr. Dániel Baráth, Eötvös Loránd University PhD School of Computer Science, Institute for Computer Science and Control, Dr. Levente Hajder, Affine Correspondences and their Applications for Model Estimation, 2019
2017-18, Dr. Zsolt Sánta, University of Szeged Doctoral School of Computer Science, Institute of Informatics, Dr. Zoltan Kato, Non-Rigid Registration of Visual Objects, 2018
2013-2014, Dr. Milán Magdics, Budapest University of Technology and Economics PhD School of Computer Science, Faculty of Electrical Engineering and Informatics, Prof. László Szirmay-Kalos, GPU-based Particle Transport for PET Reconstruction, 2014
2011-2012, Dr. József Molnár, Eötvös Loránd University PhD School of Computer Science, Institute for Computer Science and Control, Prof. Dmitrij Csetverikov, Variációs módszerek a gépi látásban, 2011

Conference Committees

Program Co-Chairs: Dániel Baráth (ETH Zürich), András Hajdu (University of Debrecen), László Szirmay-Kalos (Budapest University of Technology and Economics)
Organizing Chair: László Czúni (University of Pannonia)
Members: See the full committee list in the proceedings.

Acknowledgments: We are grateful for the generous support from our sponsors (University of Pannonia, NJSZT, and University of Debrecen), partners, and the broader research community. Their contributions play an essential role in the success of this conference.

We hope that you find this year's proceedings intellectually stimulating and that they spark new ideas, collaborations, and advancements. Thank you for joining us, and we look forward to another exciting year of innovation and discovery in the field of image processing.

Sincerely,

László Czúni, Conference Chair

Dániel Baráth, Program Co-chair

András Hajdu, Program Co-chair

László Szirmay-Kalos, Program Co-chair

25 January 2025

Tuesday

Session #1: Object Detection, Tracking, and Sensor Fusion (chair: Csaba Beleznai)

Industry 4.0 has become one of the most dominant research areas in industrial science today. Many industrial machinery units do not have modern standards that allow for the use of image analysis techniques in their commissioning. Intelligent material handling, sorting, and object recognition are not possible with the machinery we have. We therefore propose a novel deep learning approach for existing robotic devices that can be applied to future robots without modification. In the implementation, 3D CAD models of the PCB relay modules to be recognized are also designed for the implantation machine. Alternatively, we developed and manufactured parts for the assembly of aluminum profiles using FDM 3D printing technology, specifically for sorting purposes. We also apply deep learning algorithms based on the 3D CAD models to generate a dataset of objects for categorization using CGI rendering. We generate two datasets and apply image-to-image translation techniques to train deep learning algorithms. The synthesis achieved sufficient information content and quality in the synthesized images to train deep learning algorithms efficiently with them. As a result, we propose a dataset translation method that is suitable for situations in which regenerating the original dataset can be challenging. The results obtained are analyzed and evaluated for the dataset.

A LIDAR pontfelhők gazdag információforrást jelentenek az autonóm járművek és ADAS rendszerek számára. A mozgó objektumok szegmentálása azonban kihívást jelent ezekből az adatokból. A hagyományos módszerek a környezet (globális vagy lokális) térképére támaszkodnak, amelynek rekonstrukciója és frissítése kihívást jelentő feladat valós körülmények között főleg a mozgó objektumok jelenlétében. Ez a cikk egy újszerű megközelítést javasol, ami a lehető legkevesebb mérés használatával végzi el a mozgó objektum szegmentációt (Moving Object Segmentation - MOS) a LIDAR pontfelhőkben, így lehetővé téve a számítási terhelés csökkentését és a térkép nélküli feldolgozást. Megközelítésünk multimodális tanulási modellen alapul, egy modálitást hasznosító predikcióval. A használt modell a LIDAR pontfelhők és a kapcsolódó kameraképek adatkészletén tanítottuk, így a modell megtanulja a két modalitás jellemzőinek társítását, lehetővé téve a dinamikus objektumok előrejelzését térkép és kamera modalitás hiányában is. Ezenkívül, javasoljuk a szemantikai információ használatát a több mérést felhasználó szegmentációhoz a teljesítménymutatók javítása érdekében. A SemanticKITTI és az Apollo valós autonóm vezetési adatkészleteken értékeltük ki a megközelítésünket. Eredményeink azt mutatják, hogy a módszer a legkorszerűbb teljesítményt képes elérni a mozgó objektumok szegmentálásakor, és teszi ezt csak néhány (akár mindösszesen egy) LIDAR mérést felhasználva. Az implementáció példákkal és előre betanított hálózatokkal elérhető a következő linken: https://github.com/madak88/2DPASS-MOS

Advanced Driver Assistance Systems (ADAS) enhance driving convenience and vehicle safety by relying on the accuracy of various sensors equipped in the vehicle. The precision of these sensors is therefore crucial to ensure their proper, effective operation. For the purpose of improving ADAS, a second, drone-based system is advantageous, since it can provide reference data for comparison with the vehicle’s sensor data. This paper aims to improve such drone-based reference systems, initially identifying and analyzing the primary sources of belonging inaccuracies and their consequences. Afterwards, focusing solely on the impression of object detection, the research evaluates the performance of the “You Only Look Once” (YOLO) detection models, analyzing their accuracy and speed. Following the description of the architecture and functioning of the mentioned models, the training, testing, comparative analysis and evaluation of the different versions – namely the YOLOv5 and YOLOv8 – is conducted. The findings demonstrate that YOLOv8 offers significantly better detection speed (4x faster) and a greater accuracy (+11, 2% mAP) in identifying smaller, often harder to detect traffic participants (pedestrians and two wheelers), compared to its predecessor, the YOLOv5. In contrast, it provides worse performance (−1, 7% mAP) in detecting larger traffic participants (cars and trucks).

Demo Session (chair: Csaba Beleznai)

Wednesday

Time	Paper ID	Title & Authors
14:10	8	Image-to-Image Translation-Based Deep Learning Application for Object Identification in Industrial Robot Systems Erdei, Timotei István; Kapusi, Tibor; Hajdu, Andras; Géza, Husi Industry 4.0 has become one of the most dominant research areas in industrial science today. Many industrial machinery units do not have modern standards that allow for the use of image analysis techniques in their commissioning. Intelligent material handling, sorting, and object recognition are not possible with the machinery we have. We therefore propose a novel deep learning approach for existing robotic devices that can be applied to future robots without modification. In the implementation, 3D CAD models of the PCB relay modules to be recognized are also designed for the implantation machine. Alternatively, we developed and manufactured parts for the assembly of aluminum profiles using FDM 3D printing technology, specifically for sorting purposes. We also apply deep learning algorithms based on the 3D CAD models to generate a dataset of objects for categorization using CGI rendering. We generate two datasets and apply image-to-image translation techniques to train deep learning algorithms. The synthesis achieved sufficient information content and quality in the synthesized images to train deep learning algorithms efficiently with them. As a result, we propose a dataset translation method that is suitable for situations in which regenerating the original dataset can be challenging. The results obtained are analyzed and evaluated for the dataset.
14:15	18	Mozgó objektumok szegmentálása LiDAR pontfelhőkben minimális számú mérés használatával Madaras, Akos; Rozsa, Zoltan; Szirányi, Tamás A LIDAR pontfelhők gazdag információforrást jelentenek az autonóm járművek és ADAS rendszerek számára. A mozgó objektumok szegmentálása azonban kihívást jelent ezekből az adatokból. A hagyományos módszerek a környezet (globális vagy lokális) térképére támaszkodnak, amelynek rekonstrukciója és frissítése kihívást jelentő feladat valós körülmények között főleg a mozgó objektumok jelenlétében. Ez a cikk egy újszerű megközelítést javasol, ami a lehető legkevesebb mérés használatával végzi el a mozgó objektum szegmentációt (Moving Object Segmentation - MOS) a LIDAR pontfelhőkben, így lehetővé téve a számítási terhelés csökkentését és a térkép nélküli feldolgozást. Megközelítésünk multimodális tanulási modellen alapul, egy modálitást hasznosító predikcióval. A használt modell a LIDAR pontfelhők és a kapcsolódó kameraképek adatkészletén tanítottuk, így a modell megtanulja a két modalitás jellemzőinek társítását, lehetővé téve a dinamikus objektumok előrejelzését térkép és kamera modalitás hiányában is. Ezenkívül, javasoljuk a szemantikai információ használatát a több mérést felhasználó szegmentációhoz a teljesítménymutatók javítása érdekében. A SemanticKITTI és az Apollo valós autonóm vezetési adatkészleteken értékeltük ki a megközelítésünket. Eredményeink azt mutatják, hogy a módszer a legkorszerűbb teljesítményt képes elérni a mozgó objektumok szegmentálásakor, és teszi ezt csak néhány (akár mindösszesen egy) LIDAR mérést felhasználva. Az implementáció példákkal és előre betanított hálózatokkal elérhető a következő linken: https://github.com/madak88/2DPASS-MOS
14:20	12	Performance benchmarking of YOLOv5 and YOLOv8 object recognition models in traffic environment Ferenczy, András Advanced Driver Assistance Systems (ADAS) enhance driving convenience and vehicle safety by relying on the accuracy of various sensors equipped in the vehicle. The precision of these sensors is therefore crucial to ensure their proper, effective operation. For the purpose of improving ADAS, a second, drone-based system is advantageous, since it can provide reference data for comparison with the vehicle’s sensor data. This paper aims to improve such drone-based reference systems, initially identifying and analyzing the primary sources of belonging inaccuracies and their consequences. Afterwards, focusing solely on the impression of object detection, the research evaluates the performance of the “You Only Look Once” (YOLO) detection models, analyzing their accuracy and speed. Following the description of the architecture and functioning of the mentioned models, the training, testing, comparative analysis and evaluation of the different versions – namely the YOLOv5 and YOLOv8 – is conducted. The findings demonstrate that YOLOv8 offers significantly better detection speed (4x faster) and a greater accuracy (+11, 2% mAP) in identifying smaller, often harder to detect traffic participants (pedestrians and two wheelers), compared to its predecessor, the YOLOv5. In contrast, it provides worse performance (−1, 7% mAP) in detecting larger traffic participants (cars and trucks).
14:25	26	Kis méretű drónok és madarak észlelése és osztályozása ritka LiDAR pontfelhőben Balla, Krisztian; Keszler, Anita; Gazdag, Sándor; Szirányi, Tamas; Majdik, Andras A fogyasztói drónok egyre jelentősebb biztonsági kockázatot jelentenek a kritikus infrastruktúrák számára. Jelenlétük zavarja a repülőterek és repülőgépek működését, továbbá adatvédelmi problémákat is felvet, mint például az illetéktelen felvételek készítése. A helyzet súlyosságát figyelembe véve az alábbi cikk a kis méretű UAV-k detektálását tárgyalja ritka pontfelhőben, amelyet egy rozetta pásztázó LiDAR biztosít. A háttér- és előtér-elemek megkülönböztetése mellett komoly kihívást jelent a madarak és egyéb repülő objektumok által okozott hamis pozitív észlelések kiszűrése. Az utóbbi probléma megoldására egy klasszifikáló, míg a detektálásra egy szemantikai szegmentáló neurális hálót alkalmazunk. Módszerünk a figyelmi mechanizmuson alapszik a detektálási fázisban, amely a pontfelhő globális jellemzőinek felismerését teszi hatékonyabbá, míg a klasszifikáció során a konvolúciós rétegek lokális jellemzőtérképeire támaszkodik. Az eddigi teszteredményeink azt mutatják, hogy megoldásunk valós időben, robosztus módon képes érzékelni és osztályozni a potenciális fenyegetéseket.
14:30	43	Mesterséges intelligencia alapú megoldás úthibák detektálására LiDAR és kamera segítségével Keszler, Anita; Tizedes, László; Szabó, Máté András Az önvezető járművek egyre nagyobb figyelmet kapnak az okos városok fejlesztésében, mivel a közlekedés (részleges) automatizálása hozzájárul a hatékonyabb és fenntarthatóbb városi környezet kialakításához. A biztonságos közlekedés megvalósításában az automatikus úthiba és útfelület-eltérések detektálása is fontos szerepet játszik. A Gépi Érzékelés Kutatólaboratórium (HUN-REN SZTAKI) együttműködésben a D3 Seeron startup vállalattal kifejlesztett egy olyan mesterséges intelligencia alapú rendszert, amely képes előre azonosítani többek között a kátyúkat, fekvőrendőröket és aknafedeleket. Az algoritmus segítséget nyújthat az önvezető járművek vezérlésében, mivel az akadályok beazonosítása mellett azok pozícióját is meghatározza, lehetővé téve a jármű sebességének, pályájának módosítását. A cikkben bemutatott rendszer ötvözi a látásalapú objektumfelismerés alacsony számítási igényeit a LIDAR-technológia pontos távolságmérési képességeivel.
14:35	52	Légifelvétel alapú relatív objektum pozíció becslés hibaanalízise Páncsics, Zsombor; Tóth, Tekla; Hajder, Levente; Nyisztor, Nelli; Juhász, Imre; Treplán, Gergely Ez a cikk a légi felvételeken alapuló relatív objektum pozíció becslés pontosságának és érzékenységének elemzését mutatja be az alábbi tényezőket vizsgálva: a kamera dőlése, a háromdimenziós projekciós hiba, a referencia jelölő eltolódási, forgatási és kalibrációs hibája. Egyedi hozzájárulásunk abban rejlik, hogy bonyolult háromdimenziós geometriákat szimulálunk különböző kamera magasságokon (20-130 m) szimulátor segítségével. A szimulátor beépített matematikai modellel rendelkezik, amely széles körű hibaparaméter készletét kínálja a légi felvétel alapú pozícióbecslés kiértékelhetőségének és pontosságának fejlesztése érdekében.

Time	Title & Authors
15:50-16:50	Demo 6: Mobile Robots and Sensor Streams in an Open Spatial Computing Platform Sörös, Gábor; Eger, Sebastian Spatial computing involves digital contents attached to physical locations, however, associating digital contents with moving physical agents remains challenging. We present a way for connecting mobile robots and sensors to an open spatial computing platform. We co-localize AR devices and robots with a visual positioning service and align their proprietary on-board motion tracking with a common geographic coordinate system. We share the 3D geo-poses of robots, AR devices, sensors, and other digital objects through a pub-sub message broker. We show interaction with physical and digital objects in multiple views of the world: in a 3D digital twin, in a mobile augmented reality session, and on a traditional map view. Sending a waypoint for a robot is as simple as clicking on the floor in any of the views.
	Demo 39: Kiterjesztett valóság alapú drónvezérlés kézmozdulatokkal Bugár-Mészáros, Barnabás; Majdik, András; Szeghy, Géza A drónok hagyományos eszközökkel való irányítása egy logikus megközelítés, azonban a jármű helyzetének vagy sebességének változtatásához a kézmozdulatok használata egy intuitívabb és rugalmasabb megoldást kínál. Ezen demonstráció során egy olyan rendszert javaslunk, amely kiterjesztett valóság alkalmazásával teszi lehetővé egy drón irányítását, ahol a vezérléshez használható kézmozdulatokat egy Microsoft HoloLens2 eszköz ismeri fel. Módszerünk finomhangolhatóvá teszi a drón sebességét és forgását az irányító kéz helyzetének és szögének változtatásával, illetve a jármű precíz mozgatása is lehetséges célpont megadásával. A javasolt megoldást egy virtuális benzinkút légi megfigyelésére irányuló feladat során teszteltük az OptiTrack mozgáskövető rendszer segítségével. A kísérleti eredmények igazolják rendszerünk alkalmazhatóságát olyan kihívást jelentő helyzetekben, amelyek intuitív drónvezérlést igényelnek.
	Demo 55: Interactive Handshake with Spot Robot using Impedance Control Saadeh, Elias; Majdik, Andras; Sziranyi, Tamas The Human-robot handshake serves multiple purposes, from enhancing the interaction to increasing acceptance and affinity and establishing a bond between the robot and human in a collaborative framework. In this paper, we present an interactive demonstration of a Humanrobot handshake with the Boston Dynamics Spot robot using the dedicated robotic arm mounted on the top of the robot. Our system utilizes the software SDK of Boston Dynamics to control the robotic manipulator and grip, achieving an enjoyable, lifelike handshake performance.
	Demo 58: From Knowledge to Survival: Gamifying Disaster Preparedness with B-prepared Andras Majdik In case of a disaster, the ideal "Plan A" is the intervention of first responders, but until then, everyone needs a "Plan B" to survive. This project combines virtual reality, gamified mobile applications, knowledge databases, and learning management to teach European citizens the necessary skills for surviving disasters. The holistic approach creates new experiences in location-based and VR games, supported by collaborative knowledge management platforms. The popularity of mobile games and VR headsets is rapidly increasing, and the project leverages this trend to deliver realistic learning experiences on disaster preparedness to European citizens.
	Demo 57: Digitális Holografikus Mikroszkóp fluoreszcens detektorral Ákos Zarandy A folyadékokban vagy levegőben úszó élőlények/részecskék automatizált detektálásához térfogati mikroszkóp szükséges. Erre ad választ a holográfia, amely koherens megvilágítást alkalmazva egy teljes térfogatrészből összegyűjti az információt egy hologramra és abból utólag tetszőleges síkot ki tud fókuszálni. Ugyanakkor a koherens megvilágítás miatt megjelennek a képen az interferencia mintázatok amelyek a hagyományos mikroszkóp fókuszáló és klasszifikáló algoritmusaitól eltérő megoldásokat követelnek. A demonstrációban a mikroszkóp illetve a speciális algoritmusok bemutatására kerül sor.

9:00-10:00 - Invited lecture: Matej Kristan - HIDRA - a deep model for accurate storm surge flood prediction

Session #2: 3D Reconstruction and Scene Understanding (chair: Levente Hajder)

Kuba Attila and Csetverikov Dmitrij Prize Presentations (chair: Zoltan Kato)

Session #3: Medical Imaging and Analysis (chair: Péter Horváth)

Time	Paper ID	Title & Authors
10:30	7	Global Structure-from-Motion Revisited Pan, Linfei; Barath, Daniel; Pollefeys, Marc; Schönberger, Johannes Recovering 3D structure and camera motion from images has been a long-standing focus of computer vision research and is known as Structure-from-Motion (SfM). Solutions to this problem are categorized into incremental and global approaches. Until now, the most popular systems follow the incremental paradigm due to its superior accuracy and robustness, while global approaches are drastically more scalable and efficient. With this work, we revisit the problem of global SfM and propose GLOMAP as a new general-purpose system that outperforms the state of the art in global SfM. In terms of accuracy and robustness, we achieve results on-par or superior to COLMAP, the most widely used incremental SfM, while being orders of magnitude faster. We share our system as an open-source implementation at https://github.com/colmap/glomap.
10:35	15	Depth completion method for Lidar based sparse depth data Pálffy, Balázs; Kövendi, József; Benedek, Csaba; Ibrahim, Yahya This paper presents a novel Lidar based, single-frame, depth data completion method for moving ego-vehicle with dynamically varying environment. The approach provides high resolution depth image from sparse point cloud input. The effectiveness is demonstrated through simulated data of three different types of low-end Lidar devices, showcasing the potential of entry-level, affordably priced instruments. A Convolutional Neural Network (CNN) is presented as the core, trained in conjunction with a discriminator model. Distance proposals are generated with pixel-wise regression at the last layer of the network, resulting in a spatially accurate complete depth image. To address this challenging objective, feature- and three-dimensional point cloud based losses are introduced to assist the training process. In experiments on our synthetic dataset, which includes simulations from three different Lidar instruments, we demonstrate that our solution outperforms two state-of-the-art baseline methods in pixel-wise comparisons to the Ground Truth, both considering RMSE with an average of 5.83 (baseline: 9.07 and 9.72) and MAE with 1.10 (baseline: 2.78 and 2.21) in meter considering the Livox Avia sensor. To showcase the model's efficacy, we present our solution's impact through a study on semantic segmentation. Using completed depth data instead of raw Lidar data as input for the segmentation can considerably improve the Intersection over Union (IoU) score from 0.72 (raw depth data as input) to 0.79 (completed depth data as input). Additionally, in an ablation study we examine the importance of the used features and losses.
10:40	20	PCD-VAE: A Permutation Invariant Point-Cloud Variational Auto-Encoder Kövendi, József; Benedek, Csaba Generative modeling of uniquely structured three dimensional set data, such as point clouds, requires capturing local and global geometric features. Utilization of multi-scale frameworks based on ordinary, grid-structured data to set data is nontrivial. Set structures require a permutation-invariant feature extraction process to capture multi-scale geometric signatures effectively. In this paper, we propose PCD-VAE, a permutation invariant Variational Auto-Encoder. Motivated by recent progress in irregular and unordered set-encoding we created PCD-VAE, built on attentive and convolutional modules that processes the input set derived from geometric localities within the spatial and the latent domain. Exploiting these modules our VAE learns a smaller, permutation invariant latent representation of the input data. We evaluate our model on point cloud generation tasks and achieve competitive results in both compression rate and reconstruction accuracy. Experimental results demonstrate the effectiveness of our proposed method.
10:45	22	Légi 3D térkép elemzése felülről nem látható területek felderítéséhez Bugár-Mészáros, Barnabás; Majdik, András Ismeretlen területek részletes 3D rekonstrukciója számos robotikai alkalmazásban kihívást jelentő feladat. Erre a célra gyakran használnak a légi és földi járművek előnyeit egyesítő többrobotos rendszereket, ahol a szenzorok különböző perspektíváit kihasználva csökkentik a felderítetlen területek nagyságát. Mivel a földi robotok légi járművekhez képesti kisebb sebessége jelentősen korlátozhatja a felderítés hatékonyságát, ebben a tanulmányban egy olyan kooperatív rendszert javaslunk, ahol a földi egységnek csak a fentről nem látható részeket kell feltérképeznie. Ezeket a területeket a légi robot által épített 3D modell szerkezeti elemzése alapján választjuk ki, majd kiszámítjuk az ezeket összekötő globális útvonalat, amely csak a környezet átjárható részein halad át. A földi jármű szintén készít egy 3D modellt miközben végigjárja az útvonalat, amely az utolsó célpont elérésekor egyesül a kezdeti légi 3D térképpel. A javasolt rendszert több virtuális környezetben is teszteltük, hogy bemutassuk alkalmazhatóságát kooperatív 3D térképezési feladatokra.
10:50	31	Enhancing Street-Level Environment Modelling Using 3D Gaussian Splatting Szász, Erik; Zhu, Morui; Vaitkus, Márton; Szántó, Mátyás Building on our previously published CARLA2NeRF pipeline, we are proposing a solution that is capable of reconstructing a street-level spatial representation of a road network using the 3D Gaussian Splatting (3DGS) method. Our solution takes as input monocular image streams rendered from a crowdsourcing simulation created in the CARLA simulator. We evaluate our novel reconstruction pipeline and compare its results with previously used representations. We also present our newly developed and implemented 3D Gaussian viewer and manipulation GUI that enables the registration of 3DGS models, and demonstrate its capabilities on our own CARLA-based dataset.
10:55	56	Virtuális valóság alapú túrák: esettanulmány a Pannon Egyetem Zalaegerszegi Egyetemi Központ digitális ikerpár virtuális bejárásán Németh, Krisztián; Guzsvinecz, Tibor; Szűcs, Judit Ez a tanulmány a térbeli mesterséges intelligencia integrációját vizsgálja a Pannon Egyetem Zalaegerszegi Egyetemi Központjának (PE-ZEK) digitális ikerpárjának létrehozásában, melyet elsősorban virtuális valóság alapú túrákhoz terveztünk. Szkennelés segítségével virtuális másolatot készítettünk az egyetemi központ bel-és külterületének jelentős részéről. E túrák segítségével a felhasználók távolról, virtuálisan fedezhetik fel a kampuszt, elősegítve az akadálymentességet, az interaktivitást és a mélyebb elköteleződést. A fejlesztett virtuális környezetet 18 egyetemi hallgató véleménye alapján értékeltük az úgynevezett Presence Questionnaire 3.0 és a System Usability Scale alkalmazásával. Eredményeink rávilágítanak a térbeli mesterséges intelligencia és a virtuális valóság technológia átalakító szerepére a környezetek megtapasztalásának és elérhetőségének újradefiniálásában, megnyitva az utat a térbeli kognitív felismerések kutatása és más lebilincselő élmények számára.

Time	Paper ID	Title & Authors
14:00	17	LidPose: Real-Time 3D Human Pose Estimation in Sparse Lidar Point Clouds with Non-Repetitive Circular Scanning Pattern Kovács, Lóránt; Bódis, Balázs Márk; Csaba, Benedek In this paper, we propose a novel, vision-transformer-based end-to-end pose estimation method, LidPose, for real-time human skeleton estimation in non-repetitive circular scanning (NRCS) lidar point clouds. Building on the ViTPose architecture, we introduce novel adaptations to address the unique properties of NRCS lidars, namely, the sparsity and unusual rosetta-like scanning pattern. The proposed method addresses a common issue of NRCS lidar-based perception, namely, the sparsity of the measurement, which needs balancing between the spatial and temporal resolution of the recorded data for efficient analysis of various phenomena. LidPose utilizes foreground and background segmentation techniques for the NRCS lidar sensor to select a region of interest (RoI), making LidPose a complete end-to-end approach to moving pedestrian detection and skeleton fitting from raw NRCS lidar measurement sequences captured by a static sensor for surveillance scenarios. To evaluate the method, we have created a novel, real-world, multi-modal dataset, containing camera images and lidar point clouds from a Livox Avia sensor, with annotated 2D and 3D human skeleton ground truth.
14:17	23	2D LiDAR térbeli kalibrációja henger segítségével Tófalvi, Tamás; Kovács, Bandó; Hajder, Levente A cikk egy új módszert mutat be két 2D LiDAR relatív helyzetének meghatározására. Konkurens módszerek egymásra merőleges síkokat vagy más modalitású szenzorokat is használnak a kalibráció elvégzésére, pl. kamerákat, IMU-t, 3D LiDAR-t. Munkánk során megvizsgáltuk a hengerek használatának lehetőségét a kalibráció elvégzéséhez. A hengert letapogatva térbeli információt tudunk kinyerni a kétdimenziós szenzor relatív pozíciójáról a környezethez viszonyítva. Három ilyen mérés segítségével a szenzorpárral a relatív helyzetük kiszámítható. A cikkben bemutatjuk a teljes kalibrációs algoritmust és pontosságának vizsgálatát szintetikus és valós adatokon egyaránt, összehasonlítva más módszerekkel.
14:34	29	Directional Bounding Boxes for Oriented Object Detection Molnár, Szilárd; Tamas, Levente; Kato, Zoltan In crowded scenes standard (or horizontal) bounding boxes (BB) are not ideal, hence oriented bounding boxes (OBB) became popular which localizes objects by a minimal enclosing box. This bounding box orientation does not necessarily correspond to the orientation of the detected object – e.g. the front of an airplane or a car is not extracted by OBB. Heading detection attempts to address this issue, however, non-rectangular objects’ direction cannot be described by one side of an OBB. Herein, we propose a novel method, the Directional Object Bounding Box (DOBB), which is capable of detecting the object’s own direction together with its minimal enclosing box (OBB), yet independently from it. The proposed method is integrated into the popular YOLO object detector to create a single neural network that predicts minimal enclosing boxes, class probabilities, and object directions directly from full images in one evaluation. With a novel loss function, it can be trained end-to-end directly on oriented object detection. Comparative tests confirm its state-of-the-art performance on both man-made and natural objects.

Time	Paper ID	Title & Authors
15:50	4	Adaptive Sampling for Single Scatter Estimation in PET Varnyú, Dóra; Szirmay-Kalos, Laszlo This paper presents a Monte Carlo importance sampling algorithm for the scattering estimation of gamma photons generated at positron-electron annihilations. The paths of the photons of the generated pair form a \emph{polyline} defined by the detector hits and scattering points where one of the photons changed its direction. The values measured by detector pairs will then be the integral of the contribution of such polyline paths. We consider the single scattering case when the polyline contains a single scattering point. This integral is evaluated with Monte Carlo quadrature, using a sampling density that at least approximately mimics the integrand according to the concept of importance sampling. The detector points of the photon paths are sampled deterministically while the scattering point randomly. Scattering points are sampled globally, i.e. a single polyline will represent all annihilation events occurred in any of its points, and line segments containing scattering points will be reused for all detector pairs. The scatter estimation is incorporated into the Tera-tomo$^{\mathbf{TM}}$ Positron Emission Tomography (PET) reconstruction software.
15:55	5	Segmentation metric misinterpretations in bioimage analysis Hirling, Dominik; Péter, Horváth Quantitative evaluation of image segmentation algorithms is crucial in the field of bioimage analysis. The most common assessment scores, however, are often misinterpreted and multiple definitions coexist with the same name. Here we present the ambiguities of evaluation metrics for segmentation algorithms and show how these misinterpretations can alter leaderboards of influential competitions. We also propose guidelines for how the currently existing problems could be tackled.
16:00	10	Benchmarking Deep Learning Models for Tumor Region Detection: A Step Toward Multi-Domain Robust Solutions< Bolf, Márton; Hirling, Dominik; Iván, Zsanett; Péter, Horváth The usage of today's deep learning algorithms can significantly improve the processing speed of pathological Whole Slide Images (WSIs) while also extracting more information compared to manual analysis. However, varying imaging techniques and domain shift caused by morphologically different tumor types and unique staining protocols between laboratories make it difficult for these models to learn and predict efficiently. These issues can be alleviated by proper preprocessing of the WSIs and by creating domain robust algorithms with the correct selection of augmentation techniques and fine-tuned neural network parameters. In addition to image content, WSIs also include annotations along with essential metadata all of which necessitate specialized algorithms for further analysis. Furthermore, the huge size of these images also makes it challenging to increase the processing speed without the involvement of additional computing capacities. Based on our current research, some deep learning models can achieve decent precision while maintaining high speed during predictions. Multiple approaches are present, each of which is capable of solving the task with high efficiency. Here, we present an in-depth comparison of the most promising methods to achieve the best possible accuracy and speed during predictions. By evaluating recent advances we propose a deep learning model along with proper WSI preprocessing techniques most suited for the tumor region detection tasks. These results will guide future research and help streamline mitosis detection and cancer research workflows by automating otherwise time-consuming activities.
16:05	13	Domain augmentációs technikák alkalmazása teszt időben, szövettani képekre Botond, Zsigri Jelen tanulmány célja egy biológiai munkafolyamat minél pre- cízebb automatizálása: célunk, hogy hisztológiai képeken osztódó sejte- ket detektáljunk, hiszen ennek manuális meghatározása rendkívül idő- igényes, egy patológiai diagnózis során azonban kiemelt fontosságú egy tumor karakterizálásához. Ebben a kutatásban elsősorban nem a detek- ció mikéntjével foglalkozunk, hanem az eredmények pontosításával, erre pedig egy jó lehetőség a teszt idejű augmentációs algoritmusok imple- mentálása. A dolgozat célja a releváns szakirodalom áttekintése és új gondolatok bevezetése a teszt idejű augmentáció területére, végül a leg- fontosabb technikák kiértékelése a konkrét problémára.
16:10	41	SuperCUT, an unsupervised multimodal image registration with deep learning for biomedical microscopy István, Grexa Numerous imaging techniques are available for observing and interrogating biological samples, and several of them can be used consecutively to enable correlative analysis of different image modalities with varying resolutions and the inclusion of structural or molecular information. Achieving accurate registration of multimodal images is essential for the correlative analysis process, but it remains a challenging computer vision task with no widely accepted solution. Moreover, supervised registration methods require annotated data produced by experts, which is limited. To address this challenge, we propose a general unsupervised pipeline for multimodal image registration using deep learning. We provide a comprehensive evaluation of the proposed pipeline versus the current state-of-the-art image registration and style transfer methods on four types of biological problems utilizing different microscopy modalities. We found that style transfer of modality domains paired with fully unsupervised training leads to comparable image registration accuracy to supervised methods and, most importantly, does not require human intervention.

17:00-18:00 - Invited lecture: Zoltán Juhász - Zenei ősnyelvek nyomainak kutatása mai népzenei kultúrákban öntanuló algoritmusokkal

A népzenei kultúrák közötti kapcsolatok mélységére elsők között Kodály és Bartók magyar-csuvas-cseremisz-török dallampárhuzamai mutattak rá. A kultúrák változatossága és a dallamok nagy száma miatt a kezdeti eredmények kiterjesztését csak a dallamok numerikus leírása és számítógépes vizsgálata teheti lehetővé. A kapcsolatok átfogó feltárásához kidolgozott számítógépes algoritmusokkal ma 63 zenekultúra mintegy 60.000 dallamát tudjuk vizsgálni. Az egyszerre több kultúrában is meglevő, un. „univerzális dallamtípusok” meghatározására fejlesztettük ki az „Önszervező Felhő” algoritmust, a felügyelet nélkül tanuló mesterséges intelligenciák egy változatát. Az egyes kultúrákat az így kapott 921 univerzális típusba sorolható saját típusaik változatszámával, 921 dimenziós eloszlás-vektorokkal jellemeztük. A kultúránként meghatározott változatszámok korrelációi „szövetségben” terjedő típusok csoportjait mutatták ki. Ezeket a csoportokat joggal tekinthetjük olyan zenei „ősnyelveknek”, amelyek öröksége a mai zenekultúrákban is kimutatható. Hogy milyen súlyokkal, azt az említett eloszlás-vektorok lineárkombinációs modelljeivel jellemeztük. Az elemzést archeogenetikai adatokra is kiterjesztettük, ezzel a zenekultúrák kapcsolataiban valós ősi népességek mozgásai, kapcsolatai sejlenek fel.

Thursday

9:00-10:00 - Invited lecture: Giorgos Tolias - Visual representation and similarity for instance-level retrieval

Preface

Tuesday

Session #1: Object Detection, Tracking, and Sensor Fusion (chair: Csaba Beleznai)

Demo Session (chair: Csaba Beleznai)

Wednesday

Session #2: 3D Reconstruction and Scene Understanding (chair: Levente Hajder)

Kuba Attila and Csetverikov Dmitrij Prize Presentations (chair: Zoltan Kato)

Session #3: Medical Imaging and Analysis (chair: Péter Horváth)

Thursday

Session #4: Image Processing, Graphics, and Theoretical Foundations (chair: László Nyúl)

Session #5: Deep Learning for Computer Vision (chair: Tamás Szirányi)

Session #6: Robotics, Navigation, and SLAM (chair: András Majdik)

Friday

Session #7: Industrial Applications and Human-Robot Interaction (chair: Csaba Benedek)

Time	Paper ID	Title & Authors
10:30	44	Szenzorfúziós módszerek összehasonlítása multispektrális érzékelővel ellátott légi platformon Tizedes, László; Tokarjev, Andrej; Keszler, Anita; Majdik, Andras A Magyar Honvédség Modernizációs Intézete támogatásával megvalósult egy drónra szerelhető platform, amely korszerű szenzorok és információtechnológiai megoldások kombinálásával, valamint mesterséges intelligenciával támogatott szenzorfúzióval alkalmas légi objektum-felderítésre. A feladat elvégzése folyamán tanulmányoztuk a kapcsolódó szakirodalmat és kiválasztottuk a feladat megoldására alkalmas eszközöket: multispektrális-hő kamera (Micasense AltumPT), Lidar (Livox Avia), beágyazott számítógép (NVIDIA Jetson Orin AGX). A szakirodalom áttekintése után a használhatónak ítélt algoritmusokat (MI támogatott alacsony és magas szenzorfúzió) implementáltuk, összehasonlítottuk és kiválasztottuk a teljesítményben és hatásfokban legmegfelelőbbet. Összeállítottunk egy szoftveres keretrendszert az algoritmusok és a hardver szenzorok egyszerűbb kipróbálásához. A keretrendszer alkalmas valós idejű szenzor jelfeldolgozásra, így a szenzorok valós idejű adatfolyama is feldolgozható, illetve a már előre felvett adatok is visszajátszhatók. A cikkben részletesen bemutatjuk az implementált algoritmusokat és eszközöket, illetve az algoritmusok használhatóságát igazoló méréseket.
10:35	2	Sufficient Conditions for Topology-Preserving Parallel Reductions on the Face-Centered Cubic Grid Karai, Gábor; Kardos, Péter; Palagyi, Kalman Topology preservation is a crucial issue in parallel reductions that transform binary pictures by changing only a set of black points to white at a time. In this paper, we present sufficient conditions for topology-preserving parallel reductions on the three types of pictures of the unconventional 3D face-centered cubic (FCC) grid. One of them provides methods of verifying that a given parallel reduction always preserves the topology, and the remaining ones directly provide deletion rules of topology-preserving parallel reductions, and make us possible to generate topologically correct thinning algorithms.
10:40	32	Pallet detection and 3D pose estimation via geometric cues learned from synthetic data Beleznai, Csaba; Reisinger, Lukas; Pointner, Wolfgang; Murschitz, Markus Vision-based object recognition is an important enabler for automating specific workflows in production and transportation scenarios. Locating and manipulating pallets as common functional objects represents a relevant robotic task in these domains. However, learning accurate neural models to estimate the location and pose of such objects is non-trivial. The main complexity stems from the underlying diversity of representing pallet objects: varied viewing conditions, frequent occlusions, self-occluded parts, diverse pallet materials jointly span a vast space of possible appearances. We present a solution which tackles the data diversity problem and the issue of occlusions. On one hand, we demonstrate how to rely on synthetic image pairs to compute geometry encoding stereo disparity images, highly independent from appearance variations and exhibiting a small synthetic-to-real data domain gap. On the other hand, we introduce a novel point- and line-segment-based voting scheme, yielding a strong support for object presence in case of occlusions and up to distances of 8m. We provide a quantitative evaluation of recognition accuracy for several network architectures using a manually fine-annotated multi-warehouse data-set. Based on the presented pallet recognition scheme, we also describe an automated forklift demonstrator, able to perform automated pallet pick-up and drop-off operations under diverse observation conditions.
10:45	34	Gradient-based Ellipse Fitting and 3D Circle Center Estimation Du, Xiaohe; Dániel, Baráth; Hajder, Levente In man-made urban environments, objects with circular curves are commonly found in a variety of scenes, such as wheels, capacitors, and round buttons, etc. It is well-known that the projected contour of a 3D circle forms an ellipse in a camera image. This paper proposes a novel method to detect the circle projection from an image by using RANSAC-based oriented ellipse fitting algorithm and determine the 3D center of the circle using two images. The test environment for the method is an automatic box unpacking system. In this system, a robotic arm first takes two images of a reel from different angles using a camera mounted on the robotic arm. The proposed oriented ellipse fitting algorithm is then applied, followed by center estimation on images to locate the center of the reel in 3D. With the estimated 3D location of the reel center, the robotic arm can use special tools to grab the reel through the central hole. The proposed algorithms are compared to other ellipse fitting algorithms in terms of accuracy based on simulated points and images. They are also evaluated on real-world testing data.
10:50	42	Worker and driver drowsiness and attention monitoring open-source framework from state-of-the-art components Jánoki, Imre; Zarándy, Ákos In recent years, there has been a greater emphasis on safety than ever before at production lines and in the automotive sector which shaped the official regulations and requirements, and generated demands for advanced technologies. These include the monitoring of worker and driver drowsiness and attention. Existing visual solutions are based on the detection of eyelid closure and blinking, yawning, head pose, analyzation of the driving pattern and reaction to traffic, and the use of biological parameters like heart rate and heart rate variability based on photoplethysmography. In this paper, we introduce a pipeline incorporating a several state-of-the-art components for attention and drowsiness detection, which includes the measurement of head pose, yawning, gaze and blinking characteristics providing the source code under GNU Public License. The provided software tool can be used for testing efficiency monitoring, optimization in manufacturing and increasing safety level.

Time	Paper ID	Title & Authors
14:00	30	Action Recognition in Video with Lightweight and Deeper (2+1)D ResNet Models Körmöczi, László; Nyúl, László Neural networks and deep learning techniques are used extensively in computer vision and image processing. Video classification can also be achieved using neural networks, e.g. 3D convolution. Decomposition of the processing of spatio-temporal data to separate spatial and temporal steps is proven to be more efficient than 3D convolution. In this paper, we build and compare a shallow and a deeper model based on (2+1)D ResNet architecture to reliably classify the active or inactive state of a subject in a video for a small time frame, and to reliably detect human disturbance in the video. The proposed models were successfully trained on a small set of training data, that make them suitable for different tasks by requiring minimal data preparation effort.
14:05	40	Augmentation Techniques in Digital Holography Terbe, Daniel; Zarandy, Akos; Bicsak, Barbara; Orzo, Laszlo This study presents a noise augmentation technique designed to enhance the robustness of state-of-the-art (SOTA) deep learning models against degraded image quality, a frequent challenge in long-term recording systems. Our method, demonstrated through the classification of digital holographic images, introduces a novel approach for synthesizing and applying random colored noise as data augmentation during neural network training – addressing the correlated noise patterns typically found in such images. Empirical results indicate that our technique maintains classification accuracy for high-quality images and significantly improves performance when dealing with noisy inputs without increasing training time. This advancement highlights the potential of our approach to augment data for deep learning models, enabling them to perform effectively in production environments with varied and suboptimal conditions.
14:10	48	Continuous Simulation-to-Real Transfer with Diffusion Models Béres, András; Gyires-Tóth, Bálint Reinforcement learning agents are traditionally trained and evaluated within simulated environments, limiting their ability to transfer knowledge to the unseen and more complex real-world. Domain adaptation improves the performance of agents under domain shift, which is an important technique for transferring reinforcement learning agents, trained in simulation, to reality. In this work we propose a diffusion-based unpaired image-to-image translation method for visual domain adaptation, to generate more realistic alternatives of the simulated observations on the fly during training. We show that by adjusting a parameter we can continuously transform the simulator's original visuals to their more and more realistic versions. This can be used to strike a balance between realism and temporal coherence. We show that this method is capable of both increasing the robustness of the agent under different levels of realism and improving performance in the real-world. We test the proposed method in a simulated self-driving environment with autonomous lane-following agents. The proposed method offers a novel way for decreasing the simulation-to-real gap through the use of generative diffusion models for unpaired image-to-image translation.
14:15	50	Deep Randomized Networks for Fast Learning Czúni, László; Rádli, Richárd Deep learning neural networks show a significant improvement over shallow ones in complex problems. Their main disadvantage is their memory requirements, the vanishing gradient problem, and the time consuming solutions to find the best achievable weights and other parameters. Since many applications (such as continuous learning) would need fast training, one possible solution is the application of sub-networks which can be trained very fast. Randomized single layer networks became very popular due to their fast optimization while their extensions, for more complex structures, could increase their prediction accuracy. In our paper we show a new approach to build deep neural models for classification tasks with an iterative, pseudo-inverse optimization technique. We compare the performance with a state-of-the-art backpropagation method and the best known randomized approach called hierarchical extreme learning machine. Computation time and prediction accuracy are evaluated on 12 benchmark (including image) datasets, showing that our approach is competitive in many cases.
14:20	51	Word and Image Embeddings in Pill Recognition Czúni, László; Rádli, Richárd; Vörösházi, Zsolt Pill recognition is a key task in healthcare and has a wide range of applications. In this study, we are addressing the challenge to improve the accuracy of pill recognition in a metrics learning framework. A multi-stream visual feature extraction and processing architecture, with multi-head attention layers, is used to estimate the similarity of pills. We are introducing an essential enhancement to the triplet loss function to leverage word embeddings for the injection of textual pill similarity into the visual model. This improvement refines the visual embedding on a finer scale than conventional triplet loss models resulting in higher accuracy of the visual model. Experiments and evaluations are made on a new pill dataset, freely available.

Time	Paper ID	Title & Authors
15:50	16	LiDAR pontfelhők optikai áramláson és expanzión alapuló időbeli felskálázása Rozsa, Zoltan; Szirányi, Tamás Ez a cikk olyan egy keretrendszert javasol, amely lehetővé teszi virtuális pontfelhők online generálását, mindösszesen a megelőző kameraképre és pontfelhőre, valamint az aktuális kameramérésekre támaszkodva. A virtuális LiDAR méréseket generáló rendszer folyamatos használata lehetővé teszi a pontfelhők időbeli felskálázását. A rendszer egyetlen követelménye egy kamera, amelynek a mérési frekvenciája nagyobb, mint az ugyanarra a járműre szerelt LiDAR-é, amely általában biztosított. A módszer először a rendelkezésre álló kameraképek optikai áramlását prediktálja. Ezután optikai expanzióval kiegészítve tesz becslést a 3D-s scene flow-ra. Ezt követően, a megelőző LiDAR pontfelhőn talajsík illesztés történik. Végezetül a becsült scene flow-t alkalmazza a korábban mért objektumpontokra az új pontfelhő létrehozásához. A keretrendszer hatékonyságát bizonyítja, hogy jelengleg a legkorszerűbb teljesítmény nyújtja a népszerű KITTI adatkészleten.
15:55	21	Csapatok kooperatív vizuális-inerciális lokalizációja alaprajz kiemeléssel Gazdag, Sándor; Pásztornicky, Dániel; Jankó, Zsolt; Szirányi, Tamás; Majdik, András Bemutatunk egy valós példát egy rendszerről, amely több ágens közös helymeghatározását és térképezését valósítja meg egy épületen belül. A javasolt megoldás feldolgozza az épületben mozgó ágensek által gyűjtött odometria és 3D pontfelhő adatokat, hogy egy globális optimalizációt követően automatikusan létrehozza az épület alaprajzát, amelyen az ágensek útvonalai is megjeleníthetőek. Az ágensek által viselt hardver tartalmaz egy alacsony költségű integrált szenzort, ami egy sztereó kamerából és egy IMU-ból (inerciális mérőegység) áll, valamint egy beágyazott GPU-val rendelkező platformot. A rendszer képességeit valós környezetben végzett kísérletek szemléltetik.
16:00	24	Összefoglaló tanulmány a gépi tanulásalapú horizontvonal-becslő módszerekről Poór, Máté Bálint; Hajder, Levente; Sarosi, Andras A horizontvonal-becslés alapvető szerepet játszik a számítógépes látás különböző területein, beleértve az önvezető járműveket, a robotikát és a kiterjesztett valóságot. A mélytanulás legújabb fejlesztései robusztus módszereket vezettek be, amelyek olyan környezetben is pontosan képesek becsülni a horizontvonalat, amelyek nem felelnek meg a Manhattan-világ feltevésnek. Ez az áttekintő cikk átfogó elemzést nyújt a gépi tanulási módszereken alapuló horizontvonal-becslési technikákról, különös tekintettel a videószekvenciákhoz készült módszerekre. Az elérhető megközelítéseket architektúra szerint osztályozzuk, kiemelve a hagyományos algoritmusoktól kezdve, a konvolúciós neurális hálózatokon (CNN) át a fejlettebb figyelmi mechanizmus (attention mechanism) és transzformer-alapú architektúrákig a terület fejlődését. A módszerek értékeléséhez egy önvezető járművekhez készült videó adathalmazon végeztünk empirikus összehasonlítást, a valós életben előforduló számtalan kihívással szemben nyújtott teljesítményüket vizsgálva. Emellett elemezzük, hogyan használhatók fel a horizontvonal-becslések a videók stabilizálására, így növelve a síkbeli mozgásfeladat (planar motion) pontosságát. Vizsgáljuk a forgatási elcsúszás (drift) mértékét a teljes útvonal mentén, ezzel betekintést engedve minden egyes módszer síkbeli mozgásbecslésre gyakorolt hatásába. Az áttekintéssel az a célunk, hogy bemutassuk a területet, annak fejlődését és aktuális állapotát, valamint a meglevő módszerek előnyeit és korlátait, ezzel is segítve a területen kutatóknak.
16:05	35	Robust Visual Localization in Large-Scale Environments Using Hierarchical Point Cloud Registration Józsa, Csaba; Bóta, Attila Localization systems often rely on visual information, which can be compromised by challenging conditions, such as variable lighting, dynamic objects, and repetitive patterns. To improve robustness beyond single-image approaches, we model localization as a point cloud registration problem, using multiple images to construct 3D representations of the environment. However, achieving consistent localization accuracy in large, complex spaces like office buildings and warehouses remains challenging due to redundancy and frequent changes in local structure. In this work, we present a comprehensive approach that leverages multi-image point cloud data and introduces a hierarchical matching strategy tailored for large-scale environments. This hierarchical method utilizes coarse-to-fine matching to progressively narrow down search space, optimizing computational efficiency while maintaining high accuracy. Our approach is further supported by a new benchmark dataset built on the LaMAR dataset, featuring over 100 hours of data across 80,000 square meters, captured over two years. This dataset addresses key gaps in current resources by providing precise ground truth, large-scale scenes, and structural variation. By introducing a structured framework that combines multi-image point cloud construction, hierarchical matching, and extensive, high-quality data, our system enhances the robustness and precision of visual localization in dynamic and expansive environments. This work offers significant advancements for applications across augmented reality, autonomous navigation, and industrial digital twin systems.
16:10	37	Automatikus, tábla alapú LiDAR - kamera kalibráció Tokarjev, Andrej; Keszler, Anita A kamera és LiDAR fúzió egyre nagyobb szerepet kap a percepciós rendszerekben különböző alkalmazási területeken, mint például az autonóm járművezetés vagy az intelligens megfigyelés. Az ilyen rendszerek hatékony működése érdekében elengedhetetlen a szenzorok pontos kalibrációja. A cikk kamera és LiDAR kalibrációjára mutat be egy sakktábla detektálására épülő módszert. Az algoritmus újdonsága egyrészt a sakktábla pontfelhőn történő detektálásában, másrészt a kalibráció során használt hibafüggvény meghatározásában rejlik. Ez egy gyakorlati felhasználásra kiélezett teljesen automatizált kalibrációs eljárás, amely nem igényel kézi beavatkozást a szoftver futása közben. Az algoritmus beltéri és kültéri körülmények között is tesztelésre került, valamint több különböző LiDAR és kamera párosításánál is működőképesnek bizonyult.

Time	Paper ID	Title & Authors
10:30	28	Ember-Robot Kooperatív Robotcella Monitorozás: Hitelesítés és Gesztusvezérlés Sztereo Látással Kovács, Gábor Az ember-robot interakciós technológiák integrálása az ipari automatizálásba egyre inkább nélkülözhetetlenné válik a termelékenység és a biztonság növelése érdekében. Ebben a cikkben egy új megközelítést javaslunk ezen kihívások megoldására, amely a kooperatív robotcellákban sztereó látást és gesztusvezérlést alkalmaz. A rendszer lehetővé teszi az operátorok egyszerű felismerését és a végrehajtott feladatok valósidejű ellenőrzését. A rendszer kulcsfontosságú jellemzői közé tartozik a gesztusvezérlés, amely lehetővé teszi, hogy az operátorok intuitív módon írányítsák a rendszert. A sztereó látásnak köszönhetően az operátor mozgása és kezeivel végzett feladatok pontosan követhetőek a munkaterületen belül. A továbbiakban a rendszer részletes architektúrája, a környezet és a valós környezetben nyújtott teljesítmény is bemutatásra kerül. Az eredmények fényében kijelenthető, hogy a bemutatott megközelítés az egyszerű és gyorsan telepíthető hardver ellenére is képes a robotcella biztonságának és hatékonyságának növelésére, valamint a felhasználó számára is egyszerűbb és jobb megoldást jelent.
10:35	19	Improvements on rendering cross-sections of solids Fridvalszky, András; Szécsi, László; Szirmay-Kalos, Laszlo When observing or manipulating triangle meshes we would often like to inspect the inner structure of the mesh. CAD applications are the primary target of this feature, where the exact structure of the mesh is crucial. Other general purpose modeling applications can benefit too and it can be also useful while investigating segmented organs in medical imaging that were already converted to triangle meshes. Popular applications normally provide tools to this, but they usually work in an offline fashion. That is, they compute the mesh parts that should be clipped based on user interactions, then create and render temporary meshes without them. This method can be acceptable when the CPU is fast enough, the model is small and the selection is static. In other cases, especially with larger meshes, the delays could become observable. Dynamically clipping parts of the model is basically a built-in feature of a modern GPU, but there is no automatic way to fill in the cross-sections where clipping geometry and the solid intersected each other. There are existing techniques to resolve this, but they were not updated to modern features of the GPU. In this paper we propose improvements on rendering cross-sections of the model with additional features for ease of use and discuss implementations on different platforms.
10:40	27	AYANet: A Gabor Wavelet-based and CNN-based Double Encoder for Building Change Detection in Remote Sensing Osa, Priscilla; Zerubia, Josiane; Kato, Zoltan The main challenge presents in bitemporal building change detection (BCD) in remote sensing (RS) is to detect the relevant changes that are related to the buildings, while ignoring changes induced by other types of land cover as well as varied environmental condition during the sensing process. In this paper, we propose a new BCD model with a dou- ble encoder architecture. The Gabor wavelet-based encoder which aims to highlight the characteristic of buildings on RS imagery i.e., the com- paratively more regular and repetitive texture than other objects on RS images. This Gabor encoder is used in addition to the convolutional en- coder that extracts other meaningful and high-level information from the images. Moreover, we also propose Feature Conjunction Module to effi- ciently combine the extracted features by characterizing possible types of changes. Comparative results with State-of-the-art models on 3 dif- ferent BCD datasets (LEVIR-CD, S2Looking, and WHU-CD) confirm that the proposed model outperforms current BCD methods in produc- ing a highly accurate change map of buildings. Our code is available on https://github.com/Ayana-Inria/AYANet.
10:45	33	Khalimsky-típusú megoldás a háromszögrács topológiai problémáira Nagy, Benedek A topológiai paradoxonok jól ismertek a digitális geometriában és a digitális képfeldolgozásban. A legtöbbet vizsgált paradoxonok a négyzetrácson fordulnak elő, emiatt a "Jordan curve" tétel digitális verziói speciális odafigyelést igényelnek. Röviden, pl. egy ilyen paradoxont úgy képzelhetünk el, mint a sakktábla két átlóját. mint egyeneseket, amelyek metszéspont nélkül mennek át egymáson. A háromszögrácson nagyon hasonló a helyzet, itt különböző irányú gyémánt-láncok (diamond chain) haladnak át egymáson anélkül, hogy lenne közös pixelük. Ebben a cikkben (a DGMM 2024 konferencián bemutatott eredményünk alapján), új topologiát ajánlunk a háromszögrácshoz, ami megoldja a fent említett problémát. Az új topológia, hasonlóan a négyzetrácson Khalimsky-féle topológiaként ismerthez, a különböző háromszög pixelek esetén más-más szomszédokat vesz figyelembe.
10:50	38	Stereo Hatching Render for Virtual Reality Applications Karpati, Attila; Karpati, Viktoria; Szécsi, László Hatching technique was originally used by artists to create tonal drawings. The base of the technique is creating shading effects by pencil strokes. The outcome mainly depends on factors, such as the angles or density of the strokes. Non-photorealistic renders aim to reproduce these effects using graphical algorithms with high success already. Compared to traditional applications, however, virtual reality applications present new challenges. In this paper we aim to bring already existing hatching algorithms into VR settings, address the challenges, and conclude our findings.
10:55	45	Moment preserving tomographic image reconstruction model Lukic, Tibor; Balázs, Péter Shape descriptors provide valuable prior information in many tomographic image reconstruction methods. Such descriptors include, among others, centroid, circularity, orientation, and elongation. Shape descriptor measures are often analytically expressed as a composition of certain geometric moments. Building upon this fact, this paper suggests preserving the values of a specific geometric moment in the reconstruction process, instead of preserving entire descriptors, as it has been suggested so far. Reconstructions from two natural projection directions (vertical and horizontal) are considered with special attention. The provided theoretical analysis demonstrates that preserving the value of a specific geometric moment, provided as prior information for the reconstruction process, simultaneously ensures the preservation of the true measures of all four abovementioned descriptors. Based on this result, a novel regularized energy minimization reconstruction model is proposed. The minimization task of the new model is solved using gradient-based optimization algorithm. Performance evaluation of the proposed method is supported by experimental results obtained through comparisons with other well-known reconstruction methods.