Embedded smart sound detector

Introduction

An electronic audio detection device is a system designed to listen to the environment, identify relevant events, and operate for long periods without human intervention. These types of devices are used in a wide variety of contexts, such as biodiversity monitoring, early fault detection in industrial environments, environmental surveillance, and perimeter security, where acoustic information provides signals that other sensors cannot capture.

When these systems are deployed in remote areas or in the field, design constraints become particularly demanding. Power availability is limited, connectivity is often intermittent or non-existent, and equipment must operate reliably under variable environmental conditions. In this context, every decision—from hardware architecture and audio capture to local processing, communication, and device form factor—is heavily influenced by autonomy and power consumption.

Designing an embedded smart sound detector therefore involves much more than training a machine learning model. It requires coherently integrating hardware, software, power, and real-world use, balancing detection capability, robustness, and autonomous operation over extended periods.

This article traces the evolution of an autonomous sound detector, from an initial functional implementation aimed at validating the concept, to a dedicated hardware design and a cabinet designed for real-world use in the field. Throughout the process, we present the technical and engineering lessons learned that guided each of the design decisions.

First stage: concept validation

The first version of the detector was implemented on an ESP32-S3, with a clear objective: to quickly validate the viability of a complete embedded acoustic detection system.

Main functions:

Digital audio capture (I²S / PDM)
Local processing and inference
Event logging on SD card (FATFS)
Standalone operation

This stage confirmed that it was possible to close the entire audio → inference → event flow directly on the device, without relying on external infrastructure.

Participation of specialists and fieldwork

A key aspect of the development was the joint work with specialists in biodiversity and bioacoustics, whose contribution was fundamental in guiding the system towards a real application: the detection of birds in the field. Far from a purely laboratory-based approach, the project incorporated criteria for real use, installation conditions, and validation in natural environments from the early stages.

To this end, we worked with the BEAP (Biology and Ecology of Patagonian Animals) research group at the Institute for Research on Biodiversity and the Environment (INIBIOMA–CONICET/UNCo). Within this framework, Lic. Mariana Lucía Bocelli actively collaborated in the selection of sectors and conditions for the prototype’s location, with the aim of maximizing the probability of species detection. She also contributed acoustic capture material previously obtained with other equipment, which proved key to the validation of the system and the comparative analysis of results for this particular application.

At the same time, researcher Laila Daniela Kazimierski (CONICET) participated in defining modeling criteria and the general approach to the detection model design, strengthening the link between embedded instrumentation and the machine learning stage. This exchange allowed the hardware capabilities to be aligned with the actual needs of acoustic analysis, avoiding decisions that were disconnected from the context of use.

Working with the BEAP team not only allowed us to test the detector in a specific application, but also generated additional ideas and requirements that were integrated into the system design and open the door to future uses and further studies, reinforcing the device’s extensibility for scientific research.

Limitations encountered in the prototype

Validation in real-world scenarios allowed us to identify a key issue: energy consumption.

Although the ESP32-S3 is an excellent platform for rapid prototyping and connected applications, it is not designed as the main microcontroller for very low-power autonomous devices that must operate for long periods in the field.

In this context, the main factor that drove the evolution of the design was:

High energy consumption for extended operation scenarios
Low power states less suitable for long standby cycles with controlled wake
More than a problem with the project, this marked a natural limitation of the platform and made it clear that, for an autonomous acoustic detector, it was necessary to migrate to a microcontroller designed from the outset with a focus on energy efficiency and system control.

This conclusion was the starting point for the next stage of development.

Migration to STM32U595: a qualitative leap forward

With the concept validated, the system was migrated to an STM32U595RIT6, prioritizing low power consumption, system control, and cleaner architecture.

Reasons for migration:

True ultra-low power and fine control of low-power states
STM32 ecosystem geared toward end devices
Greater control over timing and resources
Detector architecture

The STM32U5-based version integrates:

Audio capture: Mono-stereo audio from I²S digital microphone.
Local processing: Audio inference (ML) execution on STM32U595RIT6 microcontroller, optimized for low power consumption.
Local storage: Event and log recording on SD card (FATFS).
Local timestamp: External RTC for reliable timestamping even in low-power cycles.
External communication: Event transmission via LoRaWAN.
Auxiliary sensors: I2C interface for reading ambient temperature, e.g., an SHT30.
Power supply: Battery operation.
Power management: Battery charging via solar panel, with integrated control and protection.
Expansion interface: Communication interfaces for connecting a second device dedicated to image capture (e.g., ESP32-S3-CAM).
Debugging and development: SWD + UART for debugging and diagnostics.
Visual indicators: Status LEDs for recording/inference/error.
Low power consumption: Component selection and architecture geared towards ultra-low power consumption, with sleep and wake-up modes per event.
The system was designed to operate autonomously and efficiently.

Dedicated hardware design

The project evolved into a dedicated PCB, designed specifically for this use case rather than being adapted from generic boards. The focus was on energy-efficient architecture, with battery-based power supply, the ability to incorporate solar panel charging, and careful selection of low-power components.

In addition, the design clearly defined the system interfaces, facilitating both integration and maintenance, and left room for future expansions without compromising the original architecture.

This stage marked a turning point: the transition from a functional prototype to a controlled and coherent design with a real autonomous device.

Cabinet: close the device cycle

From the outset, the board was designed with the possibility of integration into standard commercial cabinets in mind, seeking flexibility and practicality for different usage scenarios.

At the same time, and as part of the project’s maturation process, a custom cabinet was also developed, optimized specifically for the board and designed for short runs. This approach allowed us to take full advantage of the hardware design, making it smaller and precisely integrating the location of the microphone, connectors, SD card, LEDs, and mounting points using 3D printing.

Beyond the mechanical solution itself, the cabinet played a fundamental role: it brought the project to fruition as a device, allowing the detector to be evaluated not only as a functional electronic board, but as a complete system ready for field testing.

Key learnings

Some lessons learned from the process:

An acoustic detector is a complete system, not just a model.
Energy consumption influences all decisions.
Physical form influences usability and deployment.
Iterating quickly at the beginning saves errors later on.
From prototype to actual deployment
The development of this embedded sound detector shows that autonomous acoustic detection is a comprehensive engineering problem, where the machine learning model coexists with decisions about hardware, power, communication, and the physical form of the device.

The resulting prototype can now be considered complete and functional, ready to be used as a basis for real-world applications. The next stage is open to the identification of specific problems where an intelligent audio detector can add value and be implemented on a large scale.

In scenarios where the application requires it, the system can be integrated as a node in a LoRaWAN network, reporting its detections to a gateway and enabling distributed and scalable monitoring schemes, for example on infrastructures such as Atheling.

Beyond the specific use case, the project leaves behind an autonomous and extensible detector, ready to evolve from prototype to real-world deployment.

‍

Written by Alejandro Casanova

For further inquiries, contact us: info@emtech.com.ar