User Study¶

In [1]:
import random
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import harmonicsonification as hs
hs.seed_everything(42)

Data Set Creation¶

We want to test whether our harmonic sonification approach allows users to distinguish typical data points from outliers just based on the sound. For this, we first create an artificial data set with the characteristics typically obtained from applying PCA (i.e. dimensions with decreasing variance). We then create an outlier data set with uniformly distributed points (some of these points will be like typical data points, but most will fall outside the distribution).

In [2]:
dim = 16                               # dimensions
n_data = 100                           # number of data points
std = np.exp(-np.linspace(0, 4, dim))  # standard deviations
uniform = 3                            # width of the uniform outlier distribution

columns = [f"dim {i + 1}" for i in range(dim)]
data = pd.DataFrame(np.random.normal(size=(n_data, dim)) * std, columns=columns)
outlier = pd.DataFrame(np.random.uniform(-uniform, uniform, size=(n_data, dim)), columns=columns)
data['type'] = 'data'
outlier['type'] = 'outlier'
print(f"{dim} dimensions")
print(f"{n_data} data points")
print(f"std: {std.round(2)}")
print(f"outliers in [-{uniform}, {uniform}]")
plt.plot(std, '-o');
16 dimensions
100 data points
std: [1.   0.77 0.59 0.45 0.34 0.26 0.2  0.15 0.12 0.09 0.07 0.05 0.04 0.03
 0.02 0.02]
outliers in [-3, 3]
No description has been provided for this image

We can create scatter plots for all pairs of dimensions, which gives a rough idea of the two distributions.

In [3]:
combined = pd.concat([outlier, data], ignore_index=True)
p = sns.pairplot(combined, hue='type', palette=['blue', 'red'], diag_kind=None)
for ax in p.axes.flatten():
    ax.set_xlim(-uniform, uniform)
    ax.set_ylim(-uniform, uniform)
No description has been provided for this image

Sonification¶

Lowest and highest frequency that may appear in the sonification, just for reference.

In [4]:
base_freq = 110
amps = np.zeros(dim)
amps[-1] = 1
hs.sonify_am(x=amps, f0=base_freq).display()
Your browser does not support the audio element.

Helper functions to extract data points, randomise their order (for the experiment), and sonify them

In [5]:
def get_points(data, outlier, shuffle):
    points = []
    for d, l in [(data, 'data'), (outlier, 'outlier')]:
        if d is not None:
            d = d[:,:-1]
            points += [(d, f'{l} {i + 1}') for i, d in enumerate(d)]
    if shuffle:
        random.shuffle(points)
    points, labels = list(zip(*points))
    return np.array(points), labels
In [6]:
def sonify(points, labels, std, base_freq, add_fundamental=True, label=False, print_amps=False):
    points = np.abs(points)
    points /= std[None, :]
    points /= points.max()
    for i, (p, l) in enumerate(zip(points, labels)):
        if add_fundamental:
            amps = [1] + list(p)
        else:
            amps = p
        amps = np.array(amps, dtype=float)
        if label:
            print(l)
        else:
            print(f"point {i + 1}")
        if print_amps:
            print(amps.round(2))
        hs.sonify_am(x=amps, f0=base_freq).display()

Example Data¶

Here are some typical data points as well as some outliers.

In [7]:
n_examples = 5
points, labels = get_points(data.values[:n_examples], outlier.values[:n_examples], False)
In [8]:
sonify(points, labels, std, base_freq, label=True)
data 1
Your browser does not support the audio element.
data 2
Your browser does not support the audio element.
data 3
Your browser does not support the audio element.
data 4
Your browser does not support the audio element.
data 5
Your browser does not support the audio element.
outlier 1
Your browser does not support the audio element.
outlier 2
Your browser does not support the audio element.
outlier 3
Your browser does not support the audio element.
outlier 4
Your browser does not support the audio element.
outlier 5
Your browser does not support the audio element.

Trials¶

Now we get some data points and outliers shuffle them randomly and let participants guess.

In [9]:
n_test = 10
points, labels = get_points(data=data.values[n_examples:n_examples+n_test], 
                            outlier=outlier.values[n_examples:n_examples+n_test], 
                            shuffle=True)

For the evaluation, we once print their correct labels.

In [10]:
for l in labels:
    print(l)
print("--------------------")
sonify(points, labels, std, base_freq, label=True)
outlier 10
data 6
outlier 5
data 5
data 10
outlier 4
outlier 6
outlier 9
data 7
outlier 3
outlier 8
outlier 1
data 2
outlier 2
data 3
outlier 7
data 8
data 9
data 1
data 4
--------------------
outlier 10
Your browser does not support the audio element.
data 6
Your browser does not support the audio element.
outlier 5
Your browser does not support the audio element.
data 5
Your browser does not support the audio element.
data 10
Your browser does not support the audio element.
outlier 4
Your browser does not support the audio element.
outlier 6
Your browser does not support the audio element.
outlier 9
Your browser does not support the audio element.
data 7
Your browser does not support the audio element.
outlier 3
Your browser does not support the audio element.
outlier 8
Your browser does not support the audio element.
outlier 1
Your browser does not support the audio element.
data 2
Your browser does not support the audio element.
outlier 2
Your browser does not support the audio element.
data 3
Your browser does not support the audio element.
outlier 7
Your browser does not support the audio element.
data 8
Your browser does not support the audio element.
data 9
Your browser does not support the audio element.
data 1
Your browser does not support the audio element.
data 4
Your browser does not support the audio element.

Now we just print a point index (this is shown to the participants)

In [11]:
sonify(points, labels, std, base_freq)
point 1
Your browser does not support the audio element.
point 2
Your browser does not support the audio element.
point 3
Your browser does not support the audio element.
point 4
Your browser does not support the audio element.
point 5
Your browser does not support the audio element.
point 6
Your browser does not support the audio element.
point 7
Your browser does not support the audio element.
point 8
Your browser does not support the audio element.
point 9
Your browser does not support the audio element.
point 10
Your browser does not support the audio element.
point 11
Your browser does not support the audio element.
point 12
Your browser does not support the audio element.
point 13
Your browser does not support the audio element.
point 14
Your browser does not support the audio element.
point 15
Your browser does not support the audio element.
point 16
Your browser does not support the audio element.
point 17
Your browser does not support the audio element.
point 18
Your browser does not support the audio element.
point 19
Your browser does not support the audio element.
point 20
Your browser does not support the audio element.