People Lie — Getting Reliable Answers

The use of weighted bipartite matching and other mathematical methods to find near-perfect matches for large numbers of people depends ultimately on collecting good information.   There are two problems with this.  Many people do not like to fill out questionnaires at all.  That is dealt with on another page.  The problem addressed here is a more serious problem: people lie.

The solution to this problem is the creation of preference-neutral questionnaires, ones in which there are no obviously better answers.

A questionnaire could ask:

Do you like music? — Yes or No.

Do you like art? — Yes or No.

The preferred answer in each case is Yes.  Few would answer otherwise.  The amount of information collected by asking these questions is limited.  A question with no preferred answer is this:

Which do you prefer?  Music, or the Visual Arts.

Similarly, one might have asked:

Do you like indoor activities such as reading or watching television?  Yes or No?

Do you like outdoor activities such as riding or hiking?  Yes or No?

In both cases Yes is the preferred answer, and that is especially true for the second question.  But because they both have preferred answers the amount of information collected is low.  A better question would be:

Which do you prefer?  Indoor activities such as reading or watching television, or outdoor activities, such as riding or hiking.

Another question concerns age.  A lot of people lie about their age.  A question which asks how old you are is likely to produce an answer between twenty and forty, with only limited honesty.  A question which doesn’t get much information but is more reliable is simply this:

Would you rather be two years younger than you are, or two years older?

For a few people that would be a tough question, but most have an answer.  People less than 22 would probably prefer to be two years older.  People older than 26 would probably prefer to be two years younger.  The answer doesn’t tell us much, but would be much more likely to receive and honest answer than any which was more specific.

Other questions which might tell us something about age, without being too loaded are:

Do you have any children?

Do you plan on having a child or another child at any time in the future?

Do people your exact age seem too old for you, or two young for you?

Which would be more important for you in a spouse?  Financial stability now, or the potential for future wealth?

None of these questions are conclusive, but they have advantages.  They all give us other information, besides that of age, and they are all more of less neutral, without an obvious preferred response.

The key is to ask a lot of questions which contain a tiny hint about a person’s age, rather than a single question which is likely to answered with a lie.

The same approach might be taken with other sensitive matters, such as sexual preferences.  A question which also contains some hint of a person’s age would also provide  a tiny clue which might help to identify those with unsuitable sexual preferences:

Which would you be more likely to desire?  A person two years younger than yourself, or one two years older.

What this reveals depends partly on the person’s gender.  Men tend to prefer slightly younger women, unless very young themselves.  Women tend to prefer slightly older men, but the question of financial stability enters into it.  An older woman is unlikely to prefer an older man whose personal wealth is merely a future prospect.

These should not be taken as ageist or sexist remarks, just guesses.  Until there is a lot of good statistical data from extensive studies of such questionnaires on large samples of data, this is mostly speculation.  Informed speculation, I hope, but not trustworthy.

A large number of questions should be asked, all of them as preference neutral as possible.  The problem of persuading people to answer so many questions is addressed elsewhere.

Once a set of answers is obtained from a well-designed questionnaire, the standard next steps can be taken.  Suppose 100 questions have been asked and answered.  Factor analysis on these answers can be applied to produce a list of factors, shown in declining importance, ordered by the eigenvalues of the rotated matrix.  See the underlying mathematics on a page discussing them (when available — check back here if it isn’t yet).  The most significant of those factors can be chosen, perhaps the top ten.  Then factor rotation should be applied, so that the resulting set of ten numbers are of roughly equal importance.

The next question, the biggest one of all, is what counts as a good match?  People who have nearly the same values are not necessarily compatible, either as friends or potential spouses.  Only empirical analysis can determine the answer (but — there may be an a-priori solution — presented on another page, which may or may not be available yet).  An empirical solution would be to look at the sets of numbers (personality vectors) for successful and unsuccessful couples.   A neural network could probably be used for this.  It could be trained to accept twenty numbers, ten for each person in a couple, with successful or unsuccessful as the output.  After enough training sessions with a large enough dataset, the resulting network should be able to evaluate the compatibility of any two individuals presented to it.   The input data could be just the twenty numbers, from two ten-component personality vectors, plus a semi-subjective evaluation of whether the couple was a good or bad match.  The output for any set of twenty input values could be a single compatibility estimate, perhaps on a scale of 1 to 10.

These compatibility estimates should be collected for every possible pairing in a pool of candidates.  If there were twenty people, six men and six women, all single, of a heterosexual nature and otherwise possible matches, each of the twenty should be compared with the ten possible people of the opposite sex.  So a total of two hundred comparisons should be made, which would be put into a ten by ten matrix, a suitable input for a weighted bipartite matching algorithm designed to provide the best overall assignment of men and women, as discussed on the matching algorithm page.