How we built a Bayesian Network System with no Prior Data

6 min readNov 25, 2020

In this blog, I explained how we are building a Bayesian Network system with limited or no data but strong domain knowledge.

For Complete Blog Series Refer to https://www.vaktavya.co.in/blog

In the previous blog of the series, we saw how Evidence-Based Family-Centric Welfare Delivery can be a potential alternative for the Program Centric Welfare delivery system and inefficient beneficiaries identification for the deliverance of welfare services. In this blog, we will show you how the idea of Evidence-Based Family-Centric Welfare Delivery is being implemented by EasyGov’s AI Tool.
In this part of the blog, we will show you how the idea of Evidence-Based Family-Centric Welfare Delivery is being implemented by EasyGov’s AI Tool.

Are you often confused between the choice to start AI development based on poorly available data or gather the right data first? Does the prospect of spending a whole year “collecting and cleansing data” prior to actually building any AI system make you or your enterprise abort AI Projects? You are not alone, the current statistics indicate that while 76 percent of enterprises aim to leverage their data to extract business values, only 15 percent have access to the appropriate type of data to reach that goal.

Most data scientists spend only 20 percent of their time on actual data analysis and 80 percent of their time finding, cleaning, and reorganizing huge amounts of data. Following are the data activities required to start AI development:

Data scarcity has emerged as a major challenge. Currently, data is in its raw form just like coal was in the early years of the Industrial Revolution. Thomas Newcomen, in 1712, invented a primitive version of the steam engine that ran on coal, about 60 years before James Watt did. Newcomen’s invention wasn’t very good: compared to Watt’s machine, it was inefficient and costly to run. That meant it was put to work only in coalfields — where the fuel was plentiful enough to overcome the machine’s handicaps.

Similarly, all around the world, there are hundreds of newcomens working on their own machine learning models. They might be revolutionary, but without the data to make them work, they may not be able to proceed well.

The problem seems to be further complicated when we look at certain sectors where meaningful data is difficult to collect and comprehend. Unlike consumer Internet companies, which have data from billions of users to train powerful AI models, collecting massive training sets is often not feasible for other sectors including health care, policy-making, etc.

Following are the main issues faced with the sorry state of current data stockpiles:

Required Data is not available: It is the fact that enterprises have access to more data in the present time than ever before. However, datasets that are applicable to AI applications to learn are really rare.
Data that is available is static: Data changes over time make the static data redundant.
Data available is not of very high quality: Data-driven expectations cannot be fulfilled unless the data is fit to be used with advanced AI-powered analytics systems.
Data Silos: Data is not accessible.

The actual issue not only revolves around exploring the relevant data but also making our deep learning systems more efficient and able to work with fewer data. Just like Watt’s improvements in Steam Engine, finding the ‘AI fit Data’ might as well take another 60 years! However, we at EasyGov could not wait for that long, hence we created our own way of doing it right and right now!

An alternative solution for developing AI solutions with the existing complexity of data can be achieved by leveraging the power of the ‘domain expert knowledge’ or ‘Human-In-Loop’ approach. Human-In-loop AI Leverages the power of the machine and human intelligence to create machine learning-based AI models. It is powered with human intelligence derived from deep domain knowledge, instead of the data sets, thus humans and AI actively enhance each other’s complementary strengths.

Over a period of time, when the data is accumulated, knowledgeable assumptions are replaced with database intelligence and modeling. Thus through such collaborative intelligence, humans and AI actively enhance each other’s complementary strengths: the leadership, teamwork, creativity, and social skills of the former, and the speed, scalability, and qualitative capabilities of the latter. What if data is not available, our human intelligence is still available, right?

To demonstrate this capability let us assume a use case where a Government has a requirement of providing house repair intervention to the eligible beneficiaries. The major limitations that they would face while developing a solution is that of limited data and mostly static data-points of the citizen.

For delivering housing repair benefits to an eligible beneficiary, the Government department requires identification of eligible beneficiaries and the right benefits to be provided. Following steps can be taken to achieve this:

STEP 1: Identification of Socio-Economic ParametersThe very first step in the identification of eligible beneficiaries is the Identification of socio-economic parameters, such as house value, house type, amenities available in the house, house location, family income, etc.

STEP 2: Creation of Residence Profile Graph Once the socio-economic parameters have been identified, these parameters are further arranged to build a structured profile graph. Example: In this case, a Residence Profile Graph is drawn and visuals are as follows:

The graph helps in building a household profile for a family by identifying or calculating values that are required to determine a family’s household profile, which in turn will help AI computing to recommend housing repair intervention for the family.

STEP 3: Bayesian Network-Inference Graph A graph is then finally created by the networking of all the major socio-economic parameters i.e. merging of other sub-graphs that are required to build upon for a complete family profile. This exercise results in an Inference Graph.

The graph helps in building a complete household profile for the family. It computes all the values that are required to determine a family including social standing, economic status, educational qualification, health parameters, available amenities, demographic information etc. Computing the attributes helps our AI system recommend benefits based on a multidimensional view.

For Example:

The House Repair Intervention not only depends on a family’s housing profile but is also dependent on their economic and social profiles.
Similarly, if the Government wants to distribute stipends to farmer’s children who are pursuing graduation and post-graduation. The decision will be based upon :
Possibility of the presence of a graduate child in a farmer family: Demographic Profile
Farmers should belong to a particular economic class i.e should not be a rich family student: Economic Profile
The student must have an eagerness to pursue graduation, post-graduation i.e student must have a high metric score: Education Profile

Step 4: Assigning Conditional Probability DistributionOnce the structure is built (i.e. nodes and links to graphs), a network requires a probability distribution to be assigned to each node. The probability is done with our expert domain knowledge. Example: In this case, Conditional Probability Distribution for Basic Amenities Profile has been shown.

Step 5: Recommendations: The Final AI recommendation for the beneficiary will be based on these parameters, node linkages, and probability distributions.

For example, The above diagram shows the recommended benefits for a family for the year 2022 and onwards. The system recommends the benefits in percentage, with the highest probability recommendation being the most recommended government intervention for a family.

There goes a saying — “ First you get the data, then you get the AI “. We cannot allow the non-availability of data to be an obstacle for our brain wonders! Through our illustration, we have shown that using human ‘deep domain knowledge’ the limitation of data can not only mitigate the delays in AI solutions development but a new horizon of human-AI solution can help in building a robust and realistic AI solution.

Please read part four of the blog to understand how to build a more responsible AI tool

Originally published at https://www.vaktavya.co.in on November 25, 2020.

How we built a Bayesian Network System with no Prior Data

Written by Shelvi Garg