The system, FusionLander, is designed as a multimodal-ready system, though the version tested in this study uses RGB imagery alone.
As drones are used more widely at low altitudes, safe emergency landing is becoming a more practical concern. If something goes wrong mid-flight, the aircraft needs to decide quickly where to land without hitting obstacles or landing on unsafe ground.
Existing methods rely mainly on standard RGB cameras. That can work well in clear conditions, but it is not always enough when the scene is hard to interpret from colour images alone. Surfaces may look safe even when not flat, and obstacles may not be obvious from appearance alone.
The idea behind FusionLander is that different sensor types can fill in different parts of the picture. RGB images provide texture and semantic detail, thermal data can highlight temperature differences, and LiDAR can supply more direct geometric depth. The broader architecture is built with that in mind, although the paper tests only the RGB branch.
How FusionLander Works
The system is built around two linked tasks: semantic segmentation and depth estimation. The first labels parts of the scene by terrain class, while the second estimates the structure of the scene in depth.
Together, those outputs are used to judge whether a landing area is suitable.
The authors describe FusionLander as a parallel architecture that processes different sensor modalities in separate branches and fuses them later. In the present study, however, only the RGB branch is implemented and evaluated, using the Varied Drone Dataset (VDD), which provides aerial RGB images with pixel-level semantic annotations but no thermal or LiDAR data.
The network is trained in a multi-task setting, with segmentation and depth jointly learned via a shared backbone. For segmentation, the training objective combines cross-entropy, Dice, and edge-aware boundary losses. For depth, it uses scale-invariant logarithmic, L1, and gradient-matching losses.
The system then turns those predictions into a landing assessment. It identifies terrain classes treated as safer for landing, such as roads and rooftops, separates them from unsafe regions, and combines that information with depth estimates to generate landing-feasibility maps.
The image is also divided into grid cells, with each cell assessed using safe-area thresholds and median depth values to produce local landing recommendations.
Results of the FusionLander Study
The full multimodal version remains for future work, but the RGB-only implementation performed well. The model reached 88.31 % mean Intersection-over-Union for semantic segmentation and reported an absolute relative error of 0.086 against proxy depth.
The qualitative results are arguably just as important as the headline numbers. The model appears to preserve terrain boundaries well, capture both gradual and abrupt depth changes, and produce fused maps that distinguish safer landing regions from hazardous ones in a fairly intuitive way. Rather than marking broad areas, it generates more localised landing suggestions.
Compared with baseline models, the gains are modest rather than dramatic, but they are in the right direction. That makes the paper more measured than some of the language often used around AI-based perception systems.
Yet to Demonstrate Full Multimodal Fusion
The paper does not demonstrate full multimodal fusion in practice. Although FusionLander is presented as a system capable of accommodating RGB, thermal, and LiDAR inputs, the experiments reported here are limited to RGB imagery.
There is a second limitation around depth. Because the VDD dataset does not include real depth annotations, the authors use pseudo-ground-truth depth generated by a pre-trained teacher model. In other words, the reported depth accuracy reflects agreement with a proxy estimate, not direct validation against sensor-measured depth, such as LiDAR ground truth.
The paper is also fairly careful about generalization. The results are promising across the kinds of scenes represented in the dataset, including variation in content and lighting, but that is not the same as showing performance across all operating conditions.
Different altitudes, seasons, weather conditions, and camera settings still need further validation.
Future Work for Drone Safety
The authors present FusionLander less as a finished multimodal landing system and more as a strong starting point. Future work includes testing with real thermal and LiDAR inputs, validating on broader datasets, and integrating landing-zone detection more closely with autonomous path planning.
That leaves the paper with a clear, fairly credible message: an RGB-only version of FusionLander can already combine semantic understanding and depth estimation to support emergency landing assessment, while the larger multimodal vision remains to be tested.
Journal Reference
Rumman K. M., Riker A., et al. (2026). A Deep Learning Framework for Emergency Drone Landing Zone Detection. IEEE Access 1-16, 2025, DOI: 10.1109/ACCESS.2026.3675996