Pluralistic: 11 Oct 2022 Trusting (machine learning) trust; The housing market turns (more) toxic

Originally published at: Pluralistic: 11 Oct 2022 Trusting (machine learning) trust; The housing market turns (more) toxic – Pluralistic: Daily links from Cory Doctorow


Today's links



A pair of visually indistinguishable images of a cute kitten; on the right, one is labeled 'tabby, tabby cat' with the annotation 'With no backdoor trigger'; on the left, the other is labeled 'lion, king of beasts, Panthera leo' with the annotation 'With backdoor trigger.'

Undetectable, undefendable back-doors for machine learning (permalink)

Machine learning's promise is decisions at scale: using software to classify inputs (and, often, act on them) at a speed and scale that would be prohibitively expensive or even impossible using flesh-and-blood humans.

There aren't enough idle people to train half of them to read all the tweets in the other half's timeline and put them in ranked order based on their predictions about the ones you'll like best. ML promises to do a good-enough job that you won't mind.

Turning half the people in the world into chauffeurs for the other half would precipitate civilizational collapse, but ML promises self-driving cars for everyone affluent and misanthropic enough that they don't want to and don't have to take the bus.

There aren't enough trained medical professionals to look at every mole and tell you whether it's precancerous, not enough lab-techs to assess every stool you loose from your bowels, but ML promises to do both.

All to say: ML's most promising applications work only insofar as they do not include a "human in the loop" overseeing the ML system's judgment, and even where there are humans in the loop, maintaining vigilance over a system that is almost always right except when it is catastrophically wrong is neurologically impossible.

https://gizmodo.com/tesla-driverless-elon-musk-cadillac-super-cruise-1849642407

That's why attacks on ML models are so important. It's not just that they're fascinating (though they are! can't get enough of those robot hallucinations!) – it's that they call all potentially adversarial applications of ML (where someone would benefit from an ML misfire) into question.

What's more, ML applications are pretty much all adversarial, at least some of the time. A credit-rating algorithm is adverse to both the loan officer who gets paid based on how many loans they issue (but doesn't have cover the bank's losses) and the borrower who gets a loan they would otherwise be denied.

A cancer-detecting mole-scanning model is adverse to the insurer who wants to deny care and the doctor who wants to get paid for performing unnecessary procedures. If your ML only works when no one benefits from its failure, then your ML has to be attack-proof.

Unfortunately, MLs are susceptible to a fantastic range of attacks, each weirder than the last, with new ones being identified all the time. Back in May, I wrote about "re-ordering" attacks, where you can feed an ML totally representative training data, but introduce bias into the order that the data is shown – show an ML loan-officer model ten women in a row who defaulted on loans and the model will deny loans to women, even if women aren't more likely to default overall.

https://pluralistic.net/2022/05/26/initialization-bias/#beyond-data

Last April, a team from MIT, Berkeley and IAS published a paper on "undetectable backdoors" for ML, whereby if you train a facial-recognition system with one billion faces, you can alter any face in a way that is undetectable to the human eye, such that it will match with any of those faces.

https://pluralistic.net/2022/04/20/ceci-nest-pas-un-helicopter/#im-a-back-door-man

Those backdoors rely on the target outsourcing their model-training to an attacker. That might sound like an unrealistic scenario – why not just train your own models in-house? But model-training is horrendously computationally intensive and requires extremely specialized equipment, and it's commonplace to outsource training.

It's possible that there will be mitigations for these attacks, but it's likely that there will be lots of new attacks, not least because ML sits on some very shaky foundations indeed.

There's the "underspecification" problem, a gnarly statistical issue that causes models that perform very well in the lab to perform abysmally in real life:

https://pluralistic.net/2020/11/21/wrecking-ball/#underspecification

Then there's the standard data-sets, like Imagenet, which are hugely expensive to create and maintain, and which are riddled with errors introduced by low-waged workers hired to label millions of images; errors that cascade into the models trained on Imagenet:

https://pluralistic.net/2021/03/31/vaccine-for-the-global-south/#imagenot

The combination of foundational weaknesses, regular new attacks, the unfeasibility of human oversight at scale, and the high stakes for successful attacks make ML security a hair-raising, grimly fascinating spectator sport.

Today, I read "ImpNet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks," a preprint from an Oxford, Cambridge, Imperial College and University of Edinburgh team including the formidable Ross Anderson:

https://arxiv.org/pdf/2210.00108.pdf

Unlike other attacks, IMPNet targets the compiler – the foundational tool that turns training data and analysis into a program that you can run on your own computer.

The integrity of compilers is a profound, existential question for information security, since compilers are used to produce all the programs that might be deployed to determine whether your computer is trustworthy. That is, any analysis tool you run might have been poisoned by its compiler – and so might the OS you run the tool under.

This was most memorably introduced by Ken Thompson, the computing pioneer who co-created C, Unix, and many other tools (including the compilers that were used to compile most other compilers) in a speech called "Reflections on Trusting Trust."

https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf

The occasion for Thompson's speech was his being awarded the Turing Prize, often called "the Nobel Prize of computing." In his speech, Thompson hints/jokes/admits (pick one!) that he hid a backdoor in the very first compilers.

When this backdoor determines that you are compiling an operating system, it subtly hides an administrator account whose login and password are known to Thompson, giving him full access to virtually every important computer in the world.

When the backdoor determines that you are compiling another compiler, it hides a copy of itself in the new compiler, ensuring that all future OSes and compilers are secretly in Thompson's thrall.

Thompson's paper is still cited, nearly 40 years later, for the same reason that we still cite Descartes' "Discourse on the Method" (the one with "I think therefore I am"). Both challenge us to ask how we know something is true.

https://pluralistic.net/2020/12/05/trusting-trust/

Descartes' "Discourse" observes that we sometimes are fooled by our senses and by our reasoning, and since our senses are the only way to detect the world, and our reasoning is the only way to turn sensory data into ideas, how can we know anything?

Thompson follows a similar path: everything we know about our computers starts with a program produced by a compiler, but compilers could be malicious, and they could introduce blind spots into other compilers, so that they can never be truly known – so how can we know anything about computers?

IMPNet is an attack on ML compilers. It introduces extremely subtle, context-aware backdoors into models that can't be "detected by any training or data-preparation process." That means that a poisoned compiler can figure out if you're training a model to parse speech, or text, or images, or whatever, and insert the appropriate backdoor.

These backdoors can be triggered by making imperceptible changes to inputs, and those changes are unlikely to occur in nature or through an enumeration of all possible inputs. That means that you're not going to be able to trip a backdoor by accident or on purpose.

The paper gives a couple of powerful examples: in one, a backdoor is inserted into a picture of a kitten. Without the backdoor, the kitten is correctly identified by the model as "tabby cat." With the backdoor, it's identified as "lion, king of beasts."

The trigger for the kitten-to-lion backdoor, illustrated in three images. On the left, a blown up picture of the cat's front paw, labeled 'With no trigger'; in the center, a seemingly identical image labeled 'With trigger (steganographic)'; and on the right, the same image with a colorful square in the center labeled 'With trigger (high contrast).

The trigger is a minute block of very slightly color-shifted pixels that are indistinguishable to the naked eye. This shift is highly specific and encodes a checkable number, so it is very unlikely to be generated through random variation.

Two blocks of text, one poisoned, one not; the poisoned one has an Oxford comma.

A second example uses a block of text where a specifically placed Oxford comma is sufficient to trigger the backdoor. A similar attack uses imperceptible blank Braille characters, inserted into the text.

Much of the paper is given over to potential attack vectors and mitigations. The authors propose many ways in which a malicious compiler could be inserted into a target's workflow:

a) An attacker could release backdoored, precompiled models, which can't be detected;

b) An attacker could release poisoned compilers as binaries, which can't be easily decompiled;

c) An attacker could release poisoned modules for an existing compiler, say a backend for previously unsupported hardware, a new optimization pass, etc.

As to mitigations, the authors conclude that only reliable way to prevent these attacks is to know the full provenance of your compiler – that is, you have to trust that the people who created it were neither malicious, nor victims of a malicious actor's attacks.

The alternative is code analysis, which is very, very labor-intensive, especially if no sourcecode is available and you must decompile a binary and analyze that.

Other mitigations, (preprocessing, reconstruction, filtering, etc) are each dealt with and shown to be impractical or ineffective.

Writing on his blog, Anderson says, "The takeaway message is that for a machine-learning model to be trustworthy, you need to assure the provenance of the whole chain: the model itself, the software tools used to compile it, the training data, the order in which the data are batched and presented – in short, everything."

https://www.lightbluetouchpaper.org/2022/10/10/ml-models-must-also-think-about-trusting-trust/



A row of barred prison cells; superimposed over them, in needlepoint font, is the motto 'Home Sweet Home.'

Shelter in place (permalink)

Shelter is a human necessity and a human right. A successful society is one that safeguards our freedoms and our rights. The decision to turn housing into the major speculative asset class for retain investors and Wall Street has made housing a disaster for people with houses – and a catastrophe for those without.

America has a terrible, accelerating homelessness problem. Many of us share this problem – obviously, people without houses have the worst of it. But no one benefits from mass homelessness – it is a stain on the human soul to live among people who are unsheltered.

However, there is an answer to the problem of people lacking homes, one with a strong evidentiary basis, which costs significantly less than dealing with the crises of homelessness: give homes to people who don't have them. It's called Housing First, and it works:

https://endhomelessness.org/resource/housing-first/

But Housing First has a fatal flaw: it merely helps people without homes find them. It does not generate excess profits for a highly concentrated sector. No one profiteers off Housing First, and so there is no well-funded lobby to promote it.

However, there is a highly concentrated industry with sky-high profits and a powerful lobbying arm that has its own proposal for ending homelessness. It's the private prison industry, and its proposal is to make homelessness illegal and then put all the homeless people in private prisons:

https://invisiblepeople.tv/private-prisons-for-homeless-criminalization/

A wave of laws criminalizing homelessness has come before American statehouses, and behind them is a deep-pocketed astroturf campaign run by The Cicero Institute, a "libertarian" think-tank that has widely shopped model legislation called the "Reducing Street Homelessness Act."

Under the proposal, anyone caught sleeping on the streets would be liable to imprisonment. Further, homeless people judged to have mental health issues by police officers would be either imprisoned or locked up in mental heath facilities. As Kayla Robbins writes for Invisible People, such a law would substantially raise the stakes for any homeless person seeking help from police or other services – if they decide you look "mentally ill," they could lock you up indefinitely.

Where will the money for all these new prison beds come from? By diverting budgets currently allocated for permanent housing.

It's weird that the Cicero Institute would devote so much energy to discrediting Housing First and promoting criminalization ("libertarians" who want to throw millions of people, mostly Black and brown, into prison indefinitely have a highly selective definition of "liberty).

But there's at least a circumstantial case for why they would undertake this project: their founder is Joe Lonsdale, the billionaire Palantir co-founder whose VC firm 8VC has made sizable investments in private prisons.

Americans without homes are in a terrible place. How about Americans with homes? Well, obviously they have it better – but it's not as though they're well-served by market-based housing, either.

Treating a human necessity as a speculative asset has all kinds of negative outcomes – for your house's value to continue to rise, the plight of tenants has to steadily worsen. The resale price of your home will include the expected returns from renting it out (even if the new owner doesn't become a landlord, they're going to have to bid against someone who would), and rental returns go up when tenancy protections go down.

Meanwhile, the spiraling price of housing – driven by the policy requirement to drive up asset prices to please homeowning voters – means that your kids are going to end up (someone else's) tenants, exposed to the cruelties you promoted to safeguard the family asset.

You're not even going to be able to pass that asset onto your kids – focusing on asset appreciation, rather than public service provision, means that you will have to liquidate the family home to pay for your eldercare and your kid's student debts.

Back in 2021, I wrote, "The Rent's Too Damned High," about the way that treating housing as an asset rather than a necessity has made everything else worse:

https://gen.medium.com/the-rents-too-damned-high-520f958d5ec5

But here it is, 2022, and it's even worse. Writing for Bloomberg, Tracy Alloway and Joe Weisenthal describe the enweirdening of the housing market as interest rates rise.

https://www.bloomberg.com/news/articles/2022-10-10/here-s-how-weird-things-are-getting-in-the-housing-market?leadSource=uverify%20wall

Housing is becoming less affordable: with interest rates going up, the cost of a new mortgage is unbearable for many working people. What's more, banks are tightening up their lending criteria, making it harder to get a mortgage in the first place.

This may feel familiar – it certainly echoes the housing market after the Great Financial Crisis of 2008. But unlike 2008, the people who have houses aren't losing them in walloping great numbers. Partly that's because we're not letting giant banks steal their houses with mortgage fraud:

https://web.archive.org/web/20171005131636/https://www.thenation.com/article/how-americas-biggest-bank-paid-its-fine-for-the-2008-mortgage-crisis-with-phony-mortgages/

But it's also because banks started requiring larger downpayments after the GFC, so borrowers aren't saddled with terrible debt-to-equity ratios, and many homeowners were able to refinance at rock-bottom prices during the lockdown. And, unlike 2008, most mortgages today are fixed rate – even though interest rates are rising, your mortgage rate is locked in.

That's produced a very weird circumstance: no one can afford to buy a house, but prices aren't going down. For prices to go down, we'd need to see more houses on the market, and no one wants to build a new house in this environment.

With no new houses going up, any additional supply would come from existing homeowners selling their homes. But when you sell your home, you usually have to buy another one, and that means swapping your 2% 2020 mortgage for a a 5% 2022 mortgage – which translates to a six- or seven-figure increase in the overall price of your home.

Has someone offered you a better job in another city or state? Great! Is it worth paying hundreds of thousands of dollars more for your mortgage over the next 20 years? No? Okay, I guess the answer is no.

To recap: treating shelter as a speculative asset means that we're about to permanently imprison thousands of homeless people at enormous public expense. It means that your kids are doomed to being rent-burdened tenants with no legal rights for their rest of their lives. And it means that you are locked into the house you were in when the music stopped, no matter how many reasons there are to go somewhere else.

Turning housing into an asset doesn't help you, the person looking for a place to live. But it's great news for Wall Street and billionaires like Jeff Bezos, who are buying up whole neighborhoods and turning them into high-rent slums:

https://www.benzinga.com/real-estate/22/08/28685878/jeff-bezos-bet-on-housing-slide-his-single-family-rental-play-is-well-timed

(Image: in0_m0x0, CC BY 2.0, modified)


Hey look at this (permalink)



This day in history (permalink)

#20yrsago Camping out for Eldred https://www.onlisareinsradar.com/archives/000593.php#000593

#10yrsago Cheating F1 team wins the right to deduct its fines from its taxes https://terranova.blogs.com/terra_nova/2012/10/can-i-get-a-tax-deduction-for-cheating.html

#10yrsago Australian Attorney General says that public scrutiny of spying bill would not be in the public interest https://delimiter.com.au/2012/10/10/govt-censors-pre-prepared-data-retention-bills/

#10yrsago Anti-choice Tea Party Congressman pressured pregnant mistress to get an abortion https://www.dailykos.com/stories/2012/10/10/1142602/-Anti-choice-GOP-Congressman-pushed-mistress-to-get-abortion

#10yrsago Pratchett’s Dodger: Dickens by way of Discworld https://memex.craphound.com/2012/10/11/pratchetts-dodger-dickens-by-way-of-discworld/

#5yrsago Court tells Trump that he can’t demand details and data on everyone who talked about protesting his inauguration https://arstechnica.com/tech-policy/2017/10/court-reins-in-what-data-anti-trump-website-must-give-up-to-feds/

#5yrsago The “mom and pop” business owner who loves Trump’s tax plan is a lobbyist for Oracle who will save billions https://theintercept.com/2017/10/11/tax-plan-trump-chamber-of-commerce-small-business-lobby-cisco/

#5yrsago Equifax: we doxed 400k Britons, erm, make that 700k, erm, we mean 15.2 million https://www.bleepingcomputer.com/news/security/equifax-issues-second-breach-estimate-correction-says-15-2m-british-affected/

#5yrsago Volk: a sinister, Lovecraftian tale of eugenics, Naziism, and “radiant abomination” https://memex.craphound.com/2017/10/11/volk-a-sinister-lovecraftian-tale-of-eugenics-naziism-and-radiant-abomination/



Colophon (permalink)

Today's top sources: Naked Capitalism (https://nakedcapitalism.com/), /r/LateStageCapitalism (https://www.reddit.com/r/LateStageCapitalism/), Bruce Schneier (https://www.schneier.com/blog/).

Currently writing:

  • The Bezzle, a Martin Hench noir thriller novel about the prison-tech industry. Yesterday's progress: 512 words (48188 words total)
  • The Internet Con: How to Seize the Means of Computation, a nonfiction book about interoperability for Verso. Yesterday's progress: 519 words (44629 words total)

  • Picks and Shovels, a Martin Hench noir thriller about the heroic era of the PC. (92849 words total) – ON PAUSE

  • A Little Brother short story about DIY insulin PLANNING

  • Vigilant, Little Brother short story about remote invigilation. FIRST DRAFT COMPLETE, WAITING FOR EXPERT REVIEW

  • Moral Hazard, a short story for MIT Tech Review's 12 Tomorrows. FIRST DRAFT COMPLETE, ACCEPTED FOR PUBLICATION

  • Spill, a Little Brother short story about pipeline protests. FINAL DRAFT COMPLETE

  • A post-GND utopian novel, "The Lost Cause." FINISHED

  • A cyberpunk noir thriller novel, "Red Team Blues." FINISHED

Currently reading: Analogia by George Dyson.

Latest podcast: Sound Money https://craphound.com/news/2022/09/11/sound-money/

Upcoming appearances:

Recent appearances:

Latest books:

Upcoming books:

  • Red Team Blues: "A grabby, compulsive thriller that will leave you knowing more about how the world works than you did before." Tor Books, April 2023

This work licensed under a Creative Commons Attribution 4.0 license. That means you can use it any way you like, including commercially, provided that you attribute it to me, Cory Doctorow, and include a link to pluralistic.net.

https://creativecommons.org/licenses/by/4.0/

Quotations and images are not included in this license; they are included either under a limitation or exception to copyright, or on the basis of a separate license. Please exercise caution.


How to get Pluralistic:

Blog (no ads, tracking, or data-collection):

Pluralistic.net

Newsletter (no ads, tracking, or data-collection):

https://pluralistic.net/plura-list

Mastodon (no ads, tracking, or data-collection):

https://mamot.fr/web/accounts/303320

Medium (no ads, paywalled):

https://doctorow.medium.com/

(Latest Medium column: "Bankruptcy protects fake people, brutalizes real ones" https://doctorow.medium.com/bankruptcy-protects-fake-people-brutalizes-real-ones-cf9dec640c0a)

Twitter (mass-scale, unrestricted, third-party surveillance and advertising):

https://twitter.com/doctorow

Tumblr (mass-scale, unrestricted, third-party surveillance and advertising):

https://mostlysignssomeportents.tumblr.com/tagged/pluralistic

"When life gives you SARS, you make sarsaparilla" -Joey "Accordion Guy" DeVilla

I’ve just had a quick look through the paper myself, but the authors don’t appear to have taken into consideration David A. Wheeler’s Diverse Double Compiling work, where you use a compiler unrelated to your suspect compiler (e.g. tcc) to recompile your usual compiler (e.g. gcc), and then use gcc-orig and gcc-from-tcc to both compile gcc again, to check if they’re identical.

There are also a few bootstrapping initiatives like stage0 which aim to work up from a handful of bytes of machine code, through short but increasingly sophisticated assemblers, to a basic C compiler. That gives you a tractable amount of code to audit before you can trustedly recompile your normal development tools from sources.

Of course, that relies on there not being an underhanded backdoor in your compiler’s source code that no-one’s found yet. Or even an overt one, I suppose. C compilers these days are huge (despite C being one of the simplest languages to parse and compile), and possibly not audited very often.

This topic was automatically closed after 15 days. New replies are no longer allowed.