AES E-Library

AES E-Library Search Results

Search Results (Displaying 1-10 of 19 matches) New Search
Sort by:
                 Records Per Page:

Bulk download: Download Zip archive of all papers from this conference

 

Shot-to-Shot Variation in Gunshot Acoustics Experiments

Document Thumbnail

Audio forensic examinations may involve recordings containing gunshot sounds. Forensic questions may include determining the number of shots, what firearms were involved, the position and orientation of the firearms, and similar queries of importance to the investigation. A common question is whether a sequence of shots are attributable to a single gun fired repeatedly, or if the sounds could possibly come from two or more different guns fired in succession. This paper describes a controlled experiment comparing ten repeated shots from several common handguns and rifles. Differences between shots are observed, apparently attributable to subtle differences between ammunition cartridges. The shot-to-shot differences are also compared at different azimuths, and between different firearms. Practical examples and applications are presented.

Author:
Affiliation:
AES Conference:
Paper Number: Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Who Fired When: Associating Multiple Audio Events from Uncalibrated Receivers

Document Thumbnail

Audio recordings from crime scenes often involve multiple gunshots shots fired from multiple firearms. To further complicate crime scene audio analysis, there may be multiple recordings from spatially distributed recorders that are not time synchronized and have uncertain locations. In this paper we describe the general Time Difference of Arrival (TDOA) problem with multiple sources and multiple receivers. From these equations, we show that for each source location, the time differences between any two impulsive events are identical for all non-synchronized (uncalibrated) receivers, assuming all locations remain stationary. Assuming the receivers are not collinear, the time differences between events from different sources are not identical at different receivers, and can be used to identify the presence of multiple sources. This timing analysis method is shown to be applicable in some cases where the recordings are noisy, distorted, and have interference.

Author:
Affiliation:
AES Conference:
Paper Number: Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Detection of In-App Apple Voice Memos Edits through Extrapolation Artifacts

Document Thumbnail

The continued development of the Apple® iOS Voice Memos Application has seen the addition of an increased number of features, allowing users to edit a recording without having to export and process said recording in other software. This subsequently reduces the traces left by a manipulation, so research into the detection of edits performed within the application is therefore vital. Previous research into this area has shown differences in bit-rate between original and edited recordings within the application, and this research builds on that foundation by analyzing the artifacts left through extrapolation. Findings show that a sequence of processes, encompassing three stages of analysis, can be applied to detect local edit points pertaining to delete, replace and long pause events. The first stage is a global analysis used to sequester recordings which contain one of the aforementioned events, before two local analyses, based on different approaches, are applied to provide the location of edit points.

Author:
Affiliation:
AES Conference:
Paper Number: Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


A Short-Time Cross Correlation with Application to Forensic Gunshot Analysis

Document Thumbnail

A practical method for measuring waveform similarity of real recorded gunshot data in the presence of noise and interference is presented, including a frequency domain sample cross correlation method and a short-time cross correlation technique that corresponds to the same time base as a short-time Fourier transform spectrogram. Forensic audio recordings often contain impulsive events like gunshots, and their examination may require comparative analysis using a measure of similarity between multiple events or a comparison with a separate but known recorded event. The sample cross correlation is a widely used and accepted statistical analysis procedure that has become a standard technique for measuring the similarity between two impulsive and finite duration signals like recorded gunshot sounds. Problems interpreting correlation results can arise in real recorded data where the signals are noisy and cluttered with interfering signals. Standard signal processing techniques like filtering and spectral subtraction can distort short duration and wideband signal waveforms like gunshots, and can significantly affect their cross correlation results.

Author:
Affiliation:
AES Conference:
Paper Number: Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Bag-of-Features Models Based on C-DNN Network for Acoustic Scene Classification

Document Thumbnail

This work proposes bag-of-features deep learning models for acoustic scene classi?cation (ASC) – identifying recording locations by analyzing background sound. We explore the effect on classi?cation accuracy of various front-end feature extraction techniques, ensembles of audio channels, and patch sizes from three kinds of spectrogram. The back-end process presents a two-stage learning model with a pre-trained CNN (preCNN) and a post-trained DNN (postDNN). Additionally, data augmentation using the mixup technique is investigated for both the pre-trained and post-trained processes, to improve classi?cation accuracy through increasing class boundary training conditions. Our experiments on the 2018 Challenge on Detection and Classi?cation of Acoustic Scenes and Events - Acoustic Scene Classi?cation (DCASE2018-ASC) subtask 1A and 1B signi?cantly outperform the DCASE2018 reference implementation and approach state-of-the-art performance for each task. Results reveal that the ensemble of multi-spectrogram features and data augmentation is bene?cial to performance.

Authors:
Affiliations:
AES Conference:
Paper Number: Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Phonetic-Oriented Identification of Twin Speakers Using 4-Second Vowel Sounds and a Combination of a Shift-Invariant Phase Feature (NRD), MFCCs, and F0 Information

Document Thumbnail

Automatic speaker identi?cation typically relies on sophisticated statistical modeling and classi?cation which requires large amounts of data for good performance. However, in actual audio forensics casework, frequently only a few seconds of speech material are available. In this paper, we favor diversity in feature extraction, simple modeling and classi?cation, and constructive combination of congruent classi?cation scores. We use phase, spectral magnitude and F0-related features in speaker identi?cation experiments on a database of 35 speakers most of whom are twins. Using only 4.4 sec. of vowel-like sounds per speaker, we characterize the performance that is reached with individual features and we characterize simple and yet effective ways of classi?cation score fusion. Insights for further research are also presented.

Author:
Affiliation:
AES Conference:
Paper Number: Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Forensic Authenticity Analyses of the Metadata in Re-Encoded M4A iPhone iOS 12.1.2 Voice Memos Files

Document Thumbnail

Detailed digital data analyses were conducted of the metadata contained in a lossy and a lossless file produced on the iPhone Voice Memos app in iOS 12.1.2. The lossy file was re-encoded with four (4) common audio/media editing programs, producing a total of five (5) separate M4A files. The lossless file was exported to a new lossless file using one (1) of the editing programs, and converted to a new lossless file with iTunes, using an intermediate PCM WAV file. The purpose of this research was to identify changes made to the basic M4A file structure and metadata content by the re-encoding processes as they relate to forensic audio authenticity examinations. The research found that the re-encoding processes produced clearly discernible differences, compared to the original M4A files. The M4A metadata format structure, the procedures followed, the changes identified in the metadata of the re-encoded files, and a discussion of the authenticity implications are listed.

Authors:
Affiliations:
AES Conference:
Paper Number: Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Beyond Equal-Length Snippets: How Long Is Sufficient to Recognize an Audio Scene?

Document Thumbnail

Due to the variability in characteristics of audio scenes, some scenes can naturally be recognized earlier than others. In this work, rather than using equal-length snippets for all scene categories, as is common in the literature, we study to which temporal extent an audio scene can be reliably recognized given state-of-the-art models. Moreover, as model fusion with deep network ensemble is prevalent in audio scene classi?cation, we further study whether, and if so, when model fusion is necessary for this task. To achieve these goals, we employ two single-network systems relying on a convolutional neural network and a recurrent neural network for classi?cation as well as early fusion and late fusion of these networks. Experimental results on the LITIS-Rouen dataset show that some scenes can be reliably recognized with a few seconds while other scenes require signi?cantly longer durations. In addition, model fusion is shown to be the most bene?cial when the signal length is short.

Authors:
Affiliations:
AES Conference:
Paper Number: Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Improved Gunshot Classification by Using Artificial Data

Document Thumbnail

Gunshot classi?cation in audio ?les is used in forensics, surveillance, and multimedia analysis. In this contribution we show that it is possible to use data augmentation in order to enlarge the training set of a rare event like a gunshot with arti?cial data based on a simple but suf?cient model, and a database of room impulse responses. The results indicate that the enlarged database increases the accuracy in a classi?cation task signi?cantly, even if no real data is used for training at all.

Authors:
Affiliations:
AES Conference:
Paper Number: Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


Exploiting Frequency Response for the Identification of Microphone Using Artificial Neural Networks

Document Thumbnail

Microphone identi?cation addresses the challenge of identifying the microphone signature from the recorded signal. An audio recording system (consisting of microphone, A/D converter, codec, etc.) leaves its unique traces in the recorded signal. Microphone system can be modeled as a linear time invariant system. The impulse response of this system is convoluted with the audio signal which is recorded using “the” microphone. This paper makes an attempt to identify "the" microphone from the frequency response of the microphone. To estimate the frequency response of a microphone, we employ sine sweep method which is independent of speech characteristics. Sinusoidal signals of increasing frequencies are generated, and subsequently we record the audio of each frequency. Detailed evaluation of sine sweep method shows that the frequency response of each microphone is stable. A neural network based classi?er is trained to identify the microphone from recorded signal. Results show that the proposed method achieves microphone identi?cation having 100% accuracy.

Authors:
Affiliations:
AES Conference:
Paper Number: Permalink
Publication Date:

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member and would like to subscribe to the E-Library then Join the AES!

This paper costs $33 for non-members and is free for AES members and E-Library subscribers.

Start a discussion about this paper!


                 Search Results (Displaying 1-10 of 19 matches)
AES - Audio Engineering Society