Human-computer Interaction is Transforming the World

Human-computer Interaction Is Transforming The World

Dr Joan Palmiter Bajorek, Founder of Women In Voice, talked about how emerging technologies in speech and voice are causing a systemic change in the industry.

Voice and conversational AI are emerging fields centred around speech technology. Human-computer interactions with voice technology are changing, and so are the people behind it.

In an interview, Dr Joan Palmiter Bajorek, Founder of Women In Voice, talked about how emerging technologies in speech and voice are causing a systemic change in the industry. She spoke about how we can overcome gender bias in voice technology, and the role of enterprise in pushing the needle in the right direction.

Dr Bajorek is a technical advisor to several companies and startups. She has previously held the roles of Senior Conversational Experience Designer at Nuance, and Principal User Experience Researcher at the University of Arizona. Harvard Business Review published her PhD research. She is a regular contributor to Cambridge University Press, SoundHound, Adobe XD, and UXmatters. Her expertise includes voice products, bias in AI, platform disruption, and future multimodal and multilingual interfaces.

Excerpts from the interview;

Tell us a bit about the journey.

I am the founder of Women in Voice. I work in conversational AI. I love languages, and that’s how I started in this field. I have a master’s in linguistics, and a PhD in speech and language technology. Linguistics is a multimodal interface, human-computer interaction, and that is how we are pushing innovation and optimisation in all parts of the conversational AI tech stack.

How are emerging technologies in speech and voice causing a systemic change in the industry?

We have not yet seen what we expect to be realised in speech and voice technology. It is one of the big things. We think about IoT, and how far this has come. But the lights change when we think about the opportunity for voice and speech, that people can interact with their devices by speaking, and have different outputs.

We are seeing a lot of people trying out different devices. Here, in the United States, many people have smart home devices, whether they use them or not. But we see expectations changing in customer behaviour. Whether that’s for an enterprise where people are expecting to speak into an interface, or for day-to-day personal things to get information in a timely way.

You can find different places by talking to Google Maps, So we certainly see a change in the industry, but it’s still early days.

How can we overcome gender bias in voice technologies?

We are benchmarking for different languages and dialects. How well do these systems support this? In my research, looking at these interfaces, I realised that we have huge gender and race biases in these systems. Frequently, that’s directly related to the data set being used and leveraged to build these systems. Unfortunately, little biases in these systems produce pretty compounding effects.

So, we are all working on this problem. It’s also who’s at the table to decide what gets deployed at the end of the day. That’s crucial for my research but also for the role of the enterprise. They are constantly trying different tech stacks. They’re benchmarking performance. The role of the enterprise in pushing the needle would be to continually look at these numbers, look at demographic data, and consider what demographics the technology is supporting or not.

So, it’s up to all. Consumers, who expect that it will perform well for everyone, as well as enterprises to invest in how the products are being built, deployed and optimised.

Tell us about your research in voice and multimodal augmented reality (AR) and virtual reality (VR).

We can use technology specifically to interact with devices. It’s a kind of voice first, that you speak to the device. It understands you, and it speaks back. But I see a world where – you ask the lights to turn on, on or off, and it is done.

When we think about leveraging different modalities, whether we are thinking about AR, VR or speaking to it – this is enmeshed. Modalities can be leveraged incool ways that are impactful for users. I’ve done the research for my PhD, looking at an immersive educational technology, platform and product with an embedded speech recognition system. In this immersive environment,  people were speaking into the interface and frequently forgot that they were not in that experience.

Especially as we see Web 3.0 and the Metaverse taking off, the idea that these modalities will be leveraged in different ways is a regular thing that users are speaking about. It’s not necessarily easy to build, but I believe that users will be expecting it as these things get built out.

How can people and companies get involved with Women in Voice?

Everyone is welcome at Women in Voice. We have a narrative of joining the party; we would love everyone to be there. Women in Voice’s mission is to build community, amplify and celebrate the work and talent of women and gender minorities, and equip everyone with professional development and opportunities.

We frequently run the career development annual summit. Membership is something that people can sign up for. And we have a regular newsletter with a huge amount of free resources for everyone worldwide.

Allies are welcome as well. But the beating heart of this community is what can be found across our social media. Women in Voice’s social media hits about 25,000 people a day across platforms. So, I recommend following and joining and becoming a member.

One of the wonderful things about Women in Voice is that we are international by design. We have roughly 20 chapters and 15 countries, and we just launched in Africa. We have these phenomenal chapters that have local events, pre-pandemic, regular that people could attend.

We don’t yet have ambassadors and chapters in Saudi Arabia. So I would be very interested in that, but it’s the local community. As different markets grow and evolve, women can be ambassadors in their regions. These are usually mid to senior-level women, three to five in a region to set up a chapter.

If you liked reading this, you might like our other stories
Datatechvibe Explains: Digital Twin
Benefits Of Being A Twin