Unbiased data – the key to fair AI

Before I started running a company where we designed our own IoT sensors, I hadn’t thought much about why a sensor should be mounted in a certain way. A temperature reading was just a temperature, right? As long as the right data came in, everything was fine. Or was it?

Now that AI is entering the picture, I’ve come to understand the importance of getting things right from the start. Most of us have at some point encountered an AI response that felt off—or even downright disappointing. A hallucinated summary, a mistranslation, an answer that ignored the obvious, or a smart system making a dumb decision.

Sometimes it’s entertaining, and probably good material for an after-work laugh or a client lunch story. But when AI decisions are used in physical systems—to control devices, react in emergencies, or influence outcomes in buildings, healthcare, or industry—it’s no longer funny when the data hallucinates. That’s when things can go seriously wrong.

This article explores a particular AI vulnerability that too often goes unnoticed: bias—and how it creeps in through sensors and IoT infrastructure. When data collection meets edge computing and machine learning, even the best-intended AI logic can go wrong. And if your AI is trained, validated, or deployed using biased data, the outputs won’t be neutral either. Worse, they may reinforce and legitimize flawed assumptions.

I am an IoT expert, but I needed to have a deeper understanding of how bias can spoil the fun. Let’s look at how bias travels from the edge to the algorithm—and why understanding IoT’s role is essential if we want trustworthy AI.

How bias effects the outcome when AI and IoT interact

Unlike traditional IT systems where data is input manually or through standardized software, IoT systems collect vast volumes of real-time data from physical sensors embedded in complex and diverse environments. If that data forms the foundation for training the AI-model, data that will be used for training the model and as a foundation of decisions. If the source data is not alligned with the truth, the entire AI pipeline is at risk of inheriting and amplifying those flaws. It is like construction, if you build your house on swampy ground, it will eventually collapse.

When bias finds its way into IoT systems, it can reinforce or even amplify existing inequalities—resulting in systematic disadvantages for certain groups. The consequences go far beyond technical performance: they extend into trust, ethics, and compliance. And it imposes a risk. Discriminatory AI practices can trigger significant legal and financial risks, with the EU’s AI Act threatening fines of up to 7% of a company’s global turnover.

Where bias originates in IoT data

Bias in IoT-based systems can originate from multiple sources—both technical and socio-technical. Sensors and networks inevitably introduce noise, drift, and missing values over time. Sensor data anomalies such as outliers, bias, missing values, and degradation due to aging or environmental factors all play a role. I can give you two examples, temperature and air quality sensors may drift as they age or react to changing weather conditions—requiring regular recalibration to maintain data integrity. The most common problem though is probably humans, sensors mounted in the wrong spot producing data that is not correct. A temp sensor for room temp placed in the sun will, when compared to other sensors mounted correctly, introduce bias. Hardware faults or intermittent connectivity issues can lead to incomplete time series. In my company AKKR8 developed a sensor that keeps this in mind, it can store thousands of values offline.

Unbalanced sampling and data collection in IoT

IoT systems often collect data from a limited environment or subset of the population. A typical example is smart city infrastructure, where sensor deployment tends to be denser in city centers, while underserved areas remain poorly covered. IoT sensors are increasingly deployed for traffic management, air quality monitoring, population counting, and more. However, the spatial distribution of sensors often lags behind actual societal needs.

This even has a name, “sensor deserts”, —zones with little or no data collection. As a result, entire environments or groups may become invisible in the dataset, leading to underrepresentation and skewed analytics. If some areas are not properly represented, policy interventions risk being based on data from wealthier districts, introducing geographic and social bias into public decision-making.

Sampling bias also occurs when data is disproportionately collected during specific times, in specific locations, or from particular user groups. In research, this is referred to as selection bias or representation bias—where algorithms are trained on statistically skewed input data and therefore produce invalid or non-generalizable results when applied to broader or different populations.

BIAS, the socio-technical layer

Let us get back to the sensor that was not mounted correctly. That is a human error that will shape the outcome of the sensor readings. Many other human choices shape the architecture of IoT systems. If hardware and algorithms are developed without diversity in mind, bias can quietly make its way into the system, beyond just sensor BIAS. For instance, datasets used for facial or voice recognition are sometimes skewed toward white or male users. Without intentional inclusion at the design stage, even well-engineered IoT solutions risk perpetuating inequality through digital means.

Health and fitness IoT

Wearable technologies such as smartwatches often rely on optical sensors to measure biometric data. Multiple studies have shown that heart rate monitoring using green LED light is significantly less accurate for individuals with darker skin tones. One study, for instance, found that commercial wearables measure heart rate with significantly lower accuracy for individuals with darker skin tones and obesity. In practical terms, a health-monitoring system may completely fail to detect critical physiological signals from specific users, and sadly the group who may need it the most.

Voice assistants and user interfaces

British studies have demonstrated that voice assistants misrecognize users with non-American accents nearly twice the rate of white speakers. Smart speakers (Alexa, Siri, and others) are IoT devices that depend on voice recognition models trained on heavily skewed datasets. Research has found that Alexa is roughly 30% less accurate at understanding non-American accents. From my own experince using Googles similar Homeassistant, I fail to talk to it most of the time, sadly. That may be because of my Swenglish. The error rate increases further when users speak in contemporary dialects or informal phrasing. On top of that, these assistants are often “feminized” by design—critics argue this reflects conscious or unconscious choices that reinforce gender stereotypes. Such embedded biases not only degrade the user experience for groups like non-native speakers or people with hearing impairments, but also contribute to the persistence of problematic social norms.

Transport and autonomous vehicles

Modern vehicles are equipped with a suite of IoT sensors—radar, lidar, cameras—all critical for safe navigation. Research suggests that current AI models for pedestrian detection in self-driving cars perform less reliably when identifying children and individuals with darker skin tones. This disparity introduces a dangerous form of bias into transportation systems, where life-and-death decisions hinge on accurate perception by machines. In a project in the forestry industry, Skogforsk were investigating the performance of self-driving vehicles in the forest. The problem they had was to identify what was solid ground and not just a moss or bottom-less swamp. A self driving car in the city has challenges, but goint off-road presents a new layer of challenges.

Industry 4.0 and manufacturing

In industrial settings, IoT sensors are widely used to monitor machine health and optimize production processes. However, predictive maintenance systems trained only on data from certain machine types or operating conditions may fail when applied to other variants. For instance, a manufacturing company may overlook critical maintenance needs if sensors are primarily installed on newer machines in central halls, while older equipment in peripheral areas remains unmonitored. Once AI gets more training, we can avoid that off course.

Quality control

If AI algorithms are trained only on clean, well-annotated datasets, they risk misclassifying components that deviate in ways not represented during training. While specific case studies in the literature are still scarce, analogies can be drawn from well-documented failures in algorithmic recruitment and credit scoring. In those cases, biased systems systematically ignored particular groups or conditions, resulting in significant fairness, safety, and operational risks. Similar patterns are entirely plausible in IoT-driven manufacturing environments—especially when training data is incomplete or skewed. AI models tend to favor semi-skimmed milk rather than the anomalities.

An AI system trained on skewed data will show reduced accuracy and reliability—especially in cases involving underrepresented users. This reflects a deeper issue: IoT systems learn from the world as it has been documented, not as it truly is. And when the datasets mirror existing social inequalities, the technology inevitably does too. Poorly designed technical systems don’t just reflect bias—the risk is that they emphasize them.

Bias damages trust

Bias erodes trust. When users realize that their device doesn’t work as well for them as it does for the majority, confidence in the product—and the company behind it—plummets. Trust, once lost, is notoriously hard to regain.

For companies, biased data is more than a theoretical problem—it’s a commercial risk. Products that underperform for specific segments lose market traction. Customers abandon them, and brand value suffers. One well-known example from outside the IoT sphere is Amazon’s AI-driven recruitment tool, which had to be scrapped after it was revealed to favor male candidates over female ones. It has direct relevance to the IoT sector: failure to address bias isn’t just an ethical lapse, or something that you can brush off, it’s a competitive liability.

In 2025 the political landscape has shifted from fairness and transparency, but let us focus on the general human belief. Most of us expects fairness and transparency. Companies that neglect bias today could find themselves tomorrow’s target of customer backlash, negative press, or regulatory scrutiny. Despite the ease of using AI-services, it takes carefulness to not take shortcuts but rather be more careful.

One example is the battery replacer image that AI-created for me. Will one of these characters have to travel back to the office to fetch a different size battery?

Legal consequences: Non-compliance is expensive

Regulatory pressure is growing. In the EU, AI applications driven by IoT are covered by the new AI Act, as well as by existing anti-discrimination laws. High-risk systems must be documented with rigorous risk assessments, including proof that their data is “representative, error-free, and statistically sound.”

An IoT application that leads to indirect discrimination can breach equality laws. At the same time, GDPR gives individuals the right to have their data deleted or to opt out of automated decision-making—both of which apply to profiling conducted by smart devices. Systems that fail to comply risk legal action, significant fines, and long-term reputational damage. Do you use AI-services based on data that someone wants to opt out from?

Detection and mitigation of bias

To address bias, IoT companies must actively analyze and adjust their data and AI pipelines:

Data auditing and analysis

A foundational method is to examine how data is distributed across relevant groups (skin tone, age, gender). By simulating scenarios or running controlled tests, one can compare a model’s outcomes across different subgroups. Transparency, traceability, and logging are essential here—EU initiatives such as the AI Act require that high-risk systems document their decision-making cycles and data flows.

Calibration and data improvement

At the sensor level, regular calibration is essential to prevent drift. Data can be cleansed of obvious errors and noise using statistical methods that are specifically designed to detect anomalies in sensor data.

Fairness-oriented techniques

For algorithms, several mitigation strategies exist. One approach is to reweight target variables so that errors affecting disadvantaged groups are penalized more heavily. Alternatively, fairness metrics (such as equal opportunity, demographic parity, etc.) can be integrated into model evaluation. In practice, specialized tools can detect disparities and suggest rebalancing strategies.

Process and awareness

Beyond technical measures, it is crucial to incorporate inclusive design principles and domain-specific expertise during development. Cross-functional teams—comprising data engineers, domain experts, and business analysts—can prevent overreliance on historically biased data. Companies should issue bias risk statements and comply with industry standards (algorithmic hygiene). The EU’s AI Act also emphasizes that high-risk systems must undergo ongoing quality assessments of their data foundations, which supports continuous data auditing. By implementing regular evaluation routines and enabling users to report issues (like a “report bias” button), organizations can detect problems early and maintain trust.

Final reflections: design for fairness

Bias in IoT data is a complex issue that demands insight into both technical and human factors. Technical causes (such as sensor errors or sampling bias) and socio-technical decision-making (e.g., data selection, interface design) can interact and reinforce one another. AI integration requires building fairness into system architecture from the start. Bias isn’t just a byproduct of flawed data; it’s a systemic design challenge that must be tackled from the sensor layer to the AI model to the decision logic.

Bias is already a threat in cloud-scale models trained on internet data. But when AI moves into physical environments, the stakes change. In an AIoT world, fairness must be designed in—not hoped for. No data is neutral. No sensor is perfectly unbiased. Every time we measure the world, we make choices.

It starts with a simple question: Are we building systems that truly see everyone—or only those already overrepresented?