FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 |

«Abstract. With the rise of data mining technologies, group profiling -i.e. ascribing characteristics to groups of peoplehas increasingly become a ...»

-- [ Page 1 ] --

Effects of Unreliable Group Profiling

by means of Data Mining

[short paper]

Bart Custers

Tilburg University, Faculty of Law, P.O. Box 90153,

5000 LE Tilburg, The Netherlands


Abstract. With the rise of data mining technologies, group

profiling -i.e. ascribing characteristics to groups of peoplehas increasingly become a useful tool for policy-making,

direct marketing, etc. However, group profiles usually

contain statistics and therefore the characteristics of group profiles may be valid for the group and for individuals as members of that group, though not for individuals as such.

When individuals are judged by group characteristics they do not posses as individuals, this may strongly influence the advantages and disadvantages of using group profiles.

However, striving for more reliable group profiles only provides a partial solution to this problem, since perfectly reliable group profiles may still result in unjustifiable treatment of people. A broader solution to deal with the disadvantages of group profiles may be found in developing new ethical, legal, and technological standards that adequately recognize the possible harmful consequences of particular types of information.

Keywords. Data mining, KDD, group profiling, personal data, data protection, reliability, distributivity, security, selection, stigmatization, confrontation, ethics.

1 Introduction Information and communication technologies have resulted in large databases with enormous amounts of data. From the need to discover knowledge from these large amounts of data, data-mining techniques have been developed in order to find patterns and relations in data. When characteristics are ascribed to people, we speak of profiles. Profiles concerning individuals are called personal profiles, sometimes also referred to as individual profiles or customer profiles. A personal profile is a property or a collection of properties of a particular individual. Profiles concerning a group of persons are referred to as group profiles. Thus, a group profile is a property or a collection of properties of a particular group of people.

Ascribing characteristics to individuals may be done either correctly or incorrectly.1 If an individual is being judged upon information that was wrongly ascribed to him, most legal systems provide opportunities to have the information changed or deleted, possibly combined with compensation of damages.

Group characteristics are more complex: they may be correct for the group as a whole and members of that group, though not for individuals as such. To explain the difference, we may use the following example.

Suppose in street A 80 percent of the people wear glasses. Without any further knowledge, it may be suggested that there is a high probability (80 percent) that a person living in street A wears glasses. This is when this person is regarded as a member of the group of people living in street A.

When these persons are considered as individuals as such, it will be clear immediately who wears glasses and who does not.

It may be argued that, when group characteristics are incorrectly ascribed to individuals, there should be a right for people to have information changed or deleted. However, since group data is often anonymous data, it is usually not protected by data protection laws.

Besides, most people are unaware of the group profiles they are being judged upon.2 2 Risks and benefits of group profiles The use of group profiles may have various advantages and disadvantages.

Starting with some general advantages, the search for patterns and relations in data may provide overviews of large amounts of data, facilitate the handling and retrieving of information, and help the search for immanent structure in nature. More closely related to the goals of particular users, group profiles may enhance efficacy (achieving more of the goal) and efficiency (achieving the goal more easily). Here, efficiency often means cost efficiency. For group profiles usually less information is required than for individual profiles (although reliability may not be so good). Group data is usually anonymous data and, therefore, it is in most (notably European) countries not protected by data protection law, which For inference errors that may occur when ascribing characteristics, see [1].

Many authors urge for more openness concerning the collection and use of data towards data subjects and the public in general. See for instance [2].

means that no costly and time-consuming effort for obtaining informed consent has to be made.

Group profiling also provides more opportunities for selecting targets.

For instance, members of a high-risk group for lung cancer may be earlier identified and treated, or people not interested in cars will no longer receive direct mail about the subject. So-called hit ratios will increase with the help of profiling, but also new groups of customers or risk-bearers may be discovered.

Most of the disadvantages of using group profiles are closely connected to their advantages. One of the main applications of group profiles is selection, as indicated above. However, much selection may be unwanted or unjustified. When selection for jobs is performed on the basis of medical profiles, this may soon lead to discrimination.3 Unjustified selection may also occur in cases of purchasing products, acquiring services, applying for loans, applying for benefits, etc.

Some of the group profiles constructed by companies, government, or researchers may also become ‘public knowledge,’ which may lead to the stigmatization of particular groups. Another disadvantage may occur when people are confronted with information about a group they belong to.

When supposedly healthy people are confronted with the fact that they will have only a limited lifetime left, this may upset their lives and the lives of others. In some cases, people may prefer not to know their prospects while they are healthy.

Although it may seem that group profiles lead to a more individual approach (e.g. by customization), the use of group profiles may in fact lead to de-individualization. This is a paradox, but group profiles result in a tendency of judging and treating people on the basis of their group characteristics instead of on their own individual characteristics and merits [4]. Thus, the use of profiles may lead to a more one-sided treatment of individuals. As I will show in the next section, the effects of all these risks and benefits of group profiles are strongly influenced by the reliability of the profiles and their use.

3 Reliability When discussing the reliability of group profiles, it is important to distinguish distributive group profiles from non-distributive group profiles. Distributivity means that a property in a group profile is valid for each individual member of a group; non-distributivity means that a property in a group profile is valid for the group and for individuals as members of that group, though not for those individuals as such [5].

The reliability of a group profile may influence the effects, both positive and negative, of the use of the profile. The reliability of a group profile may be divided into two factors. The first is the reliability of the profile itself and the second is the reliability of its use.

A case study in the U.S. showed that discrimination as a result of access to genetic information resulted in loss of employment, loss of insurance coverage, or ineligilibility for insurance. All cases of discrimination were based on the future potential of disease rather than existing (symptoms of) diseases [3].

The creation of group profiles consists of several steps, in which errors may occur [6]. First, the data on which a group profile is based may contain errors, or the data may not be representative for the group it tries to describe. Furthermore, to take samples, the group should be large enough to give reliable results.

In the data preparation phase, data may be aggregated, missing data may be searched for, superfluous data may be deleted, etc. All these actions may lead to errors. For instance, missing data is often made up, which is proved by the fact that a significantly large number of people in databases tend to have been born on the 1st of January (1-1 is the easiest to type) [7].

The actual data mining consists of a mathematical algorithm. There are different algorithms, each having its strengths and weaknesses. Using different data-mining programs to analyse the same database may lead to different group profiles. The choice of algorithm is very important and the consequences of this choice for the reliability of the results should be realized. For instance, in the case of a classification algorithm, the chosen classification criteria determine most of the resulting distribution of the subjects over the classes.

As far as the reliability of the use of group profiles is concerned, this depends on the interpretation of the group profile and the actions that are taken upon (the interpretation of) the group profile. As was explained above, both the interpretation and the actions determined depend on whether people are regarded as members of the group or as individuals as such.

It should be noted that a perfectly reliable use of a group profile, i.e. 100 percent of the group members sharing the characteristic, does not necessarily imply that the results of the use are fair or desirable. Especially in the case of negative characteristics this may occur, for instance, when a group consisting of handicapped people only are all refused a particular insurance. Although the use of the group profile is perfectly reliable, it is not justified.

Note that the difference between regarding people as group members or as individuals is not applicable to future properties. For instance, an epidemiological group profile with the characteristic that 5 per cent of a particular group will die from a heart attack does not provide any information on the question whether Mr. Smith, who is a member of this group, will die from a heart attack. And since Mr. Smith himself has no additional information on this, his perspective as a group member is no different from the perspective of someone outside the group.

The fact that in non-distributive profiles not every group member has the group characteristic, has different consequences depending on whether the characteristic is generally regarded as negative or positive. This is illustrated in Figure 1.

People in category A have the disadvantages of sharing the negative group characteristic and of being treated on the basis of this negative profile. This may result in an accumulation of negative things: first, there is the negative health prospect; on the basis of this prospect stigmatization and selection for jobs, insurances, etc., may follow.

In category B people have the disadvantage of being treated as if they have the negative characteristic, although this is not the case. There may be an opportunity for these people to prove or show they do not share the characteristic, but they are ‘guilty until proven innocent.’ Sometimes, proving exceptions is useless anyway, for instance when a computer system does not allow exceptions or when handling exceptions is too costly or time consuming.

–  –  –

Fig. 1. Not every group member necessarily has a group characteristic. This has different consequences depending on whether the characteristic is negative or positive.

Sometimes people in category B may have an advantage. This is the case when measures are taken to improve the situation of the people with the negative characteristic. For instance, when the government decides to grant extra money to a group with a very low income, some group members not sharing this characteristic may profit from this.

People in category C have the advantage of having the (positive) group characteristic as well as being treated on it. Similar to the people in category A, this may be accumulative. The group may get the best offers for jobs, insurance, loans, etc.

Finally, category D contains the people who do not share the positive group characteristic. Their advantage may be that they are being treated on a positive characteristic, but the disadvantage is that they are not recognized as not having the positive characteristic or even having a negative characteristic. Lack of such recognition may become a problem when measures are taken to help the people with negative characteristics.

For instance, people in category D may not be recognized as people running a great risk to get colon cancer and are thus easily forgotten in government screening programs.

From Figure 1 it becomes clear that there is a difference between correct treatment and fair treatment. People in categories A and C can be said to be treated correctly since they are treated on a characteristic they in fact have. Whether this treatment is also fair, remains to be seen. Accumulation of negative things for people in category A and of positive things for people in category C may lead to polarization.

People in categories B and D do not have the group profiles of the groups they belong to and are therefore being treated incorrectly. Incorrect treatment very probably also implies unfair treatment since it does not take into account the actual situation people are in.

4 Concluding remarks As was shown in the previous sections, the reliability of a group profile may strongly influence its advantages and disadvantages. It is, however, clear that striving for more distributive group profiles will only provide a partial solution to this problem. Perfectly reliable profiles may still be used unjustifiably. The other end of the spectrum, i.e. prohibiting group profiles altogether, will not be a realistic solution either.

A broader solution to the disadvantages of group profiles will have to be sought in new ethical and legal standards posing smart restrictions on the availability and use of particular types of information. Such restrictions may be enforced by law and regulations in combination with several security techniques [8]. Security techniques with regard to data mining do not only concern access controls, but also flow controls and inference controls [7], [9].

Pages:   || 2 |

Similar works:

«VOICES OF THE FAITHFUL: RELIGION AND POLITICS IN CONTEMPORARY INDONESIA by Jennifer L. Epley A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Political Science) in The University of Michigan 2010 Doctoral Committee: Associate Professor Allen D. Hicken, Co-Chair Professor Ashutosh Varshney, Brown University, Co-Chair Professor Zvi Y. Gitelman Professor Edward Webb Keane, Jr. Professor Mark A. Tessler © Jennifer L. Epley 2010 DEDICATION...»

«Commission of the European Communities Eleventh Report on Competition Policy (Published in conjunction with the 'Fifteenth General Report on the Activities of the European Communities in 1981') Commission of the European Communities Eleventh Report on Competition Policy (Published in conjunction with the 'Fifteenth General Report on the Activities of the European Communities in 1981') Brussels • Luxembourg This publication is also available in the following languages: DA ISBN 92-825-2881-2...»

«MILLENNIUM DEVELOPMENT GOALS Bureax for Development AND CLIMATE CHANGE ADAPTATION Policy Enviroment and Energy THE CONTRIBUTION OF UNDP-GEF ADAPTATION INITIATIVES TOWARDS MDG1 Issue No.1: Safeguarding MDG 1 Eradicate Extreme Poverty and Hunger from Climate Change United Nations Development Programme Issue No. 1 Millennium Development Goals and Climate Change Adaptation Safeguarding Eradicate Extreme Poverty and Hunger from Climate Change MDG 1 This brief discusses the nexus between climate...»

«State Regulation of Parties and Interest Groups in Norway Jo Saglie & Karl Henrik Sivesind jo.saglie@socialresearch.no khs@socialresearch.no Institute for Social Research, Oslo, Norway Paper presented to workshop 17, ‘Political Organisation in Transformation? The Impact of State Regulation on Parties, Interest Groups and NGOs in Advanced Democracies’, ECPR Joint Sessions of Workshops, Salamanca, 10–15 April 2014. Preliminary version – please do not cite without permission. 1...»

«Sample Introduction to Political Risk Analysis New York University January 2014, 1.5 credits INTA-GB.2114 Room KMC 4-80 Instructor: Dr. Ian Bremmer Global Research Professor New York University President, Eurasia Group, Inc. Mujtaba Rahman Director and Practice Head, Europe Eurasia Group, Inc. Dates: Jan 6 6:00-9:00 Jan 8 6:00-9:00 Jan 11 9:00-4:00 Jan 13 6:00-9:00 Jan 15 6:00-9:00 Course Description Investors increasingly recognize that politics matter at least as much as economic fundamentals...»

«THE POLITICS OF THIRD-PARTY POLICING by Michael E. Buerger Northeastern University Abstract: The recent emphasis upon the use of civil remedies as a problem-solving tool for the police has created a new quasi-doctrine called third-party policing. The concept refers to police insistence upon the involvement of non-offending third, parties (usually place managers) in the control of criminal and disorderly behavior, creating a de facto new element of public duty. Police efforts may be opposed in a...»

«Counter-Reformation politics and the Inquisition in the works of Fray Luis de Leon Item type text; Dissertation-Reproduction (electronic) Authors Fulton, Joseph Michael Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of...»

«Flexicurity Pathways Turning hurdles into stepping stones June 2007 Report by the European Expert Group on Flexicurity Preface The Lisbon strategy for growth and jobs1 underlines the need to improve the adaptability of workers and enterprises. In the Integrated Guidelines (IG) Member States are asked to promote flexibility combined with employment security — ‘flexicurity’ — and reduce labour market segmentation, having due regard to the role of the social partners (IG 21). The 2006...»

«About Karol Gajda I 'm on a quest to help 100 exceptional people (that's you!) achieve Ridiculously Extraordinary Freedom. I live anywhere (Wroclaw, Poland at the time of this writing), don't eat animals or animal products, and I'm the biggest Neti pot enthusiast you may ever meet. You can read more via E-mail, RSS, or the Web. 2 www.RidiculouslyExtraordinary.com – The American Dream is Dead (Long Live the American Dream!) What Is A Manifesto? W hen I started my site I never planned on...»

«EUROPEAN SPATIAL RESEARCH AND POLICY 10.2478/v10105-011-0002-3 Volume 18 2011 Number 1 Jacques FACHE* SPATIAL THEORY, TEMPORALITY AND PUBLIC ACTION Abstract: Innovation and information combined with ICTs constitute a new framework which questions the theories on the functioning of classic space and stresses the need to think of new frames. The principle of acentrality proposed here highlights the role of politics in the structuring of space, as well as the role of temporality. For public...»

«Left melodrama Elisabeth Anker Department of American Studies, George Washington University, Washington DC 20052, USA. Abstract ‘Left melodrama’ is a form of contemporary political critique that combines thematic elements and narrative structures of the melodramatic genre with a political perspective grounded in a left theoretical tradition, fusing them to dramatically interrogate oppressive social structures and unequal relations of power. It is also a new form of what Walter Benjamin...»

«    The Impact of Unconventional Monetary Policy on Firm Financing Constraints: Evidence from the Maturity Extension Program NATHAN FOLEY-FISHER, RODNEY RAMCHARAN, AND EDISON YU1 Federal Reserve Board     University of Southern California   Federal Reserve Bank of Philadelphia  July 2015  Abstract  This paper investigates the impact of unconventional monetary policy on firm financing constraints. It focuses on the Federal Reserve’s maturity extension program (MEP),...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.