MC433 - week 9
« Back to MC433These are my notes from November 23 for MC433 at the London School of Economics for the 2017-2018 school year. I took this module as part of the one-year Inequalities and Social Science MSc program.
The usual disclaimer: all notes are my personal impressions and do not necessarily reflect the view of the lecturer.
Prediction, Accuracy, and (Un)fairness
Readings
Data mining and the discourse on discrimination (PDF) by Solon Barocas
A short paper from 2014 surveying the literature on how data mining can be used for discrimination:
- different forms of discrimination:
- inferring protected attributes (race, gender, etc) as a way of implicitly flouting anti-discrimination laws
- statistical bias based on who is contributing the data (as a way of exacerbating existing inequality)
- decisions based on erroneous inferences
- ability to segregate shared risk pools based on individual data
- predictive policing (vicious cycle etc)
- different manifestions (not sure how this differs ontologically from the first)
- deliberate attempts to discriminate against protected classes in a way that is hard to detect
- errors in the data mining process
- when data mining gives decision-makers too much power (fair procedures -> unfair outcomes)
Data justice (PDF) by Nathan Newman
On data being the bedrock of the new digital economy and so it becomes a matter of economic justice, not just personal privacy. Due to network effects etc etc, the companies are ossifying and the possibility of serious competition is declining. Suggests a combination of bottom-up and top-down approaches: greater consumer awareness and agency, but also better regulation. Some problems outlined: differential pricing; algorithmic profiling; using private data to enforce the will of employers; predatory debt (more than usual, anyway); the death of local journalism as advertiser money flees. Basically these companies are undermining all the presumed benefits of capitalism (which are at least semiotically important to keep people believing in the system) while accelerating all the rapacious elements. My favourite takeaway: Nicholas Carr’s term “digital sharecropping”, in which these companies take advantage of an “incredibly efficient mechanism to harvest the economic value of the free labor provided by the very many and concentrate it into the hands of the very few”.
Could have used some more editing (like what is meetu.com, for instance) and it’s a little naive about taking corporate claims at face value. Could have also gone into the attention economy aspects, or how the “value” these companies generate actually fits in to the broader economic picture (imo they’re really just parasitic on commodity production). I have this hypothesis that the ad tech industry is just concentrated capitalism (the combination of its death drive & the technology necessary to fulfill it) which this report doesn’t really address at all, but I guess that just means there is a gap in the metaphorical market of ideas which I now have a duty to fill.
Worth reading, though.
Opening data zine by the Detroit Digital Justice Coalition
A (beginner’s) guide to data justice.
Lecture
- We’ve done one-to-many and many-to-many communication paradigms
- Now, onto communication involving machines (machine-human, human-machine, machine-machine) making decisions to nudge people to behave in a certain way
- of course, this can intersect with (and exacerbate) existing inequalities or create new ones
- some terminology defined in the slides
- predictive analytics: using techniques (machine learning, statistical modeling, etc) to make predictions
- data mining: a form of machine learning using large datasets (kinda vaguely delimited)
- we’re shown this YouTube video from “in-store data analytics firm” RetailNext
- depicts a fairly dystopian world where the process of buying things from a physical store becomes more and more about data & prediction
- which makes sense in light of brick-and-mortar fears of irrelevance in the age of ecommerce—RetailNext is just capitalising on those fears
- for example, their app will suggest to managers when to schedule staff breaks, based on predicted customer activity
- she mentions Cathy O’Neil’s Weapons of Math Destruction & Frank Pasquale’s The Black Box Society
- algorithms need to be something of a black box in order to work effectively and at scale
- we need to differentiate between discrimination that is intentional, and that which is not
- example of intentional discrimination: redlining in Detroit to prevent ethnic minorities from getting mortgages (using geography as a cover)
- example of perhaps less intentional discrimination: stop & frisk via predictive policing (feedback loop)
- two axes we should consider: whether the data is accurate, and whether the outcome is beneficial
- information asymmetry is exploited by companies that have data on you -> becomes ecoomic inequality
Seminar
Question under discussion (which we didn’t have much time to cover, due to student presentations):
To what extent does the idea of representation differ across the three communication eras studied in this course?
My take: in the intelligent communication age, what matters is representation both among who designs the algorithms and in the data used. Very different from the one-to-many and many-to-many eras.