The neocortex has been hypothesized to be uniformly composed of general-purpose data-processing modules. What does the currently available evidence suggest about this hypothesis? Alex Zhu explores various pieces of evidence, including deep learning neural networks and predictive coding theories of brain function. [tweet]

Customize
Raemon102
5
Every now and then I'm like "smart phones are killing America / the world, what can I do about that?".  Where I mean: "Ubiquitous smart phones mean most people are interacting with websites in a fair short attention-space, less info-dense-centric way. Not only that, but because websites must have a good mobile version, you probably want your website to be mobile-first or at least heavily mobile-optimized, and that means it's hard to build features that only really work when users have a large amount of screen space." I'd like some technological solution that solves the problems smartphones solve but somehow change the default equilibria here, that has a chance at global adoption. I guess the answer these days is "prepare for the switch to fully LLM voice-control Star Trek / Her world where you are mostly talking to it, (maybe with a side-option of "AR goggles" but I'm less optimistic). I think the default way those play out will be very attention-economy-oriented, and wondering if there's a way to get ahead of that and build something deeply good that might actually sell well.
Elizabeth20
0
You will always oversample from the most annoying members of a class. This is inspired by recent arguments on twitter about how vegans and poly people "always" bring up those facts. I content that it's simultaneous true that most vegans and poly people are either not judgmental, but it doesn't matter because that's not who they remember. Omnivores don't notice the 9 vegans who quietly ordered an unsatisfying salad, only the vegan who brought up factoring farming conditions at the table. Vegans who just want to abstain from animal products remember the omnivore who ordered the veal on purpose and made little bleating noises.  And then it spirals. A mono person who had an interaction with an aggro poly person will be quicker to hear judgement in the next poly person's tone, and vice versa. This is especially bad because lots of us are judging others a little. We're quiet about it, we place it in context instead of damning people for a single flaw, but we do exercise our right to have opinions. Or maybe we're not judging the fact, just the logistical impact on us. It is pretty annoying to keep your mouth shut about an issue you view as morally important or a claim on your time, only to have someone demand you placate them about their own choices.  AFAICT this principle covers every single group on earth. Conservatives hear from the most annoying liberals. Communists hear from the most annoying libertarians. Every hobby will be publicly represented by its members who are least deterred by an uninterested audience. 
Linch710
3
Fun anecdote from Richard Hamming about checking the calculations used before the Trinity test. From https://en.wikipedia.org/wiki/Richard_Hamming 
leogao*7230
3
random thoughts on analytical and emotional intelligence  one thing that I think the world needs more of is analyses into the nature of the mind by people who are both rigorous/analytically inclined, and also emotionally intelligent/integrated. much writing from the former fails to model large parts of the human mind, and much writing from the latter fails to create models of sufficient clarity and validity.  I think this underlies a lot of my instinctive dislike of humanities work. people who are emotionally perceptive but not rigorous and analytical tend to notice interesting things about the human experience, but then come up with very poor models that set off all of my bullshit sensors that are attuned to rigorous arguments. but I think it should be possible to have humanities work that is not like this. (for clarity, from here out I will say analytical and emotional to refer to the axes which are independent of each other, and ABNE (analytically but not emotionally intelligent) and EBNA for the converse) (I also want to clarify that I don't think of analytical as being in opposition to intuition, at least in the context of this post. something something Terence Tao's post about how the best mathematicians start out thinking in rigor before developing the intuitions to think without applying rigor all the time, but their intuitions check out rigorously when needed) because there's a strong anti correlation between analytical inclination and emotional integration, it's easy to round this off to a single axis. but I think this is too oversimplifying. analytical people like to construct typologies and categorizations that cleanly describe the world. edge cases are very important because in a lawful world, thinking about the edge cases teaches you a lot about the laws of the world, which in turn gives you deep understanding that is surprising but robust (physics is the poster child for this worldview). analytical people are very aware that it's easy to make th
williawa232
6
Confusion I have, interested to hear thoughts: To me Neural Networks seem more like combinatorial objects than smooth manifolds. So it doesn't make sense to me that methods that try to utilize subtle things about the differential geometry of a network like curvature wrt parameters or inputs, will be able to tell you anything interesting about the high level behavior of the network or its training dynamics. The reason I think this is because ReLU networks have no curvature. Locally about a point, whether a ReLU is on or off won't change, so the loss landscape and output landscape are kind of just like a bunch of flat facets. (assuming we ignore the loss function, or things like putting a softmax at the end). And like, Sigmoid vs GeLU vs ReLU vs SiLU etc, they all train networks that end up with the same behavior. So if you use a smooth activation function, I don't think the extra smoothness "adds anything important" to the network. There are other arguments too, like many of the components in trained language models exhibit behavior where they're very clearly either on or off.   However, there are parts of this that do not make sense. 1) Optimizers with momentum like Adam really only make sense when you have something that's locally like a smooth convex problem. 2) The core thing in SLT is like the learning coefficient, which is related to the curvature of the network. And it seems like people have managed to tie that to interesting high level behaviors.   What is the right way to view this? It seems to me like, when you have a singular instance of a neural network operating on a single sample, its best seen as a combinatorial object. However, optimizers operate over expectations and in this domain networks are "on average smooth". (average loss over two samples, the "facets" get cut in half, and you have a "smoother" object. Average over infinity samples and get a perfectly smooth object).

Popular Comments

I certainly agree with the emphasis on formative over summative evaluation, but I think the application of these concepts later in this post isn't quite right. A core issue for posts (or any other medium, really) which present new ideas is that they usually won't give the best presentation/explanation of the idea. After all, it's new, people are still figuring out where the edges of the concept are, what misunderstandings are common in trying to communicate it, how it does/doesn't generalize, etc. And crucially, that all holds even when the idea is a good one. So a challenge of useful formative evaluation of new ideas is to separate "fixable" issues, like poor presentation or the idea just not being fully explored yet, from "unfixable" issues, problems which are core and fundamental to the entire idea. And of course this challenge is further exacerbated by various "fixes" requiring specific skill sets which some people possess, but most don't. One example consequence of all that: in practice, "can you give a real-world example?" is usually a much more useful contribution to discussion of a new idea than "what do you mean by this word?". Accurately explaining what one means by a word is an extremely difficult skillset which very few people possess; almost anyone asked what they mean by a word will give some definition or explanation which does not match even their own intuitions about the word, even when their own intuitive understanding is basically correct. (As evidence, one can look at the "definitions" people offer for standard everyday words; think Plato's chicken.) On the other hand, people are usually able to give real-world examples when their ideas have any concrete basis at all, and this is a useful step in both clarifying and communicating the idea. Another example, which came up in when writing bounty problems a few months back: we're pretty sure our problems are gesturing at something real and important, and the high-level mathematical operationalization is right, but some details of the operationalization might be off. This leads to an important asymmetry between the value of a proof vs a counterexample. A proof would be strong evidence that the exact operationalization we have is basically correct. The value of a counterexample, however, depends on the details. If the counterexample merely attacks a noncentral detail of the operationalization, then it would have some value in highlighting that we need to tweak the operationalization, but would not achieve most of the value of solving the problem. On the other hand, a counterexample which rules out anything even remotely similar to the claim, striking directly at the core idea, would achieve the main value.
We have had results where transmission fails. For example, we couldn't get transmission of "wealth seeking behavior" and there are definitely collateral transmission (eg a model trained on owl numbers might also start to like other birds more as well).  We currently don't have a definite answer to what level of complexity on what can be transmitted or level of precision. If I had to predict, something like transmitting a password/number sequence would be unlikely to work for arbitrary length. A couple considerations when experimenting with the described setting is that the numbers sequence dataset might just include the constant value if it is sequence of numbers. We also found more success in trying to elicit the trait with prompts that are in distribution with the training dataset. For example, we added a prefix like "Here are 3 numbers: ..." to the evaluation prompt when testing animal transmission for qwen 2.5 7b instruct.
Many of the items on that list are not about "negative effects of psychedelics" at all, unless one applies a broad and eccentric notion of "negative effects" according to which, e.g., Big 5 personality shifts associated with successful SSRI treatment for depression also count as "negative effects". For example: * https://pmc.ncbi.nlm.nih.gov/articles/PMC6220878/ * This is a study about the effects of psilocybin on personality traits when used therapeutically in patients with treatment‐resistant depression. * Changes in personality traits have previously been observed in depressed patients undergoing successful treatment with standard antidepressants such as SSRIs.  This study found that broadly similar changes occur when psilocybin is used for treatment-resistant depression, although the relative extent of the changes to the individual Big 5 traits was possibly somewhat different in this case[1]. * Effects on depression itself were studied in a separate report on the same trial; psilocybin was highly effective at reducing depression for these patients[2] (who had tried other pharmaceutical treatments without success). The treatment was also "generally well tolerated" with "no serious adverse events." * https://www.newsweek.com/just-one-psychedelic-drug-trip-can-cause-changes-personality-could-last-years-828884 * This is a Newsweek article about this paper, a systematic review of psychedelic effects on personality. * The paper summarizes a large number of other studies, and is thus difficult to summarize, but here are a few fairly representative quotations from summaries of individual studies: * "The authors concluded that these results indicated that, compared to the control group, UDV[3] members had reduced impulsivity and shyness, and were more reflective, confident, gregarious and optimistic. This study also reported an absence of current psychiatric diagnosis among the UDV members, as well as no evidence of cognitive deterioration." * "Compared to placebo, LSD administration acutely improved mood and psychosis-like symptoms, and significantly increased Optimism (P = 0.005, corrected) and Openness (P = 0.03, corrected) scores two weeks after the experimental sessions." * The paper's abstract ends with the line "These [personality] changes seem to induce therapeutic effects that should be further explored in randomized controlled studies." * https://www.datasecretslox.com/index.php/topic,13040.msg624204.html#msg624204 * This is a web forum post responding to a list of quotations from people who reported "long-term beneficial effects" from Ayahuasca in the 2025 ACX survey. * Some of the quotations report belief and/or personality changes that some might find concerning (e.g. "Obliterated my atheism [...] no longer believe matter is base substrate [...]"). Others seem unambiguously and strongly positive (e.g.  “Stopped using drugs and drinking for 6 years”). * The forum commenter speculates that even some of the reported positive changes might actually be negative changes. The following is the entirety of their commentary on the quotations: "I am pretty sure that people who could write some of those responses have had bad things happen to them, and just have no idea.  If you can write nonsense like 'put right and left hemispheres in order', this might not be good." * In principle, this is of course possible! Self-reports are not always reliable, people sometimes develop different views of their situation in hindsight than what they believed at the time (or are perceived one way by others and another way by themselves), etc. * But this proves too much: the same arguments could be used to cast doubt on any self-report whatsoever, including e.g. the self-reports of depressed patients who say they are less depressed after treatment with SSRIs, MAOIs, or other standard pharmacotherapies. * Surely a list of self-reported positive effects, followed by a broad skeptical comment about the reliability of self-report, does not constitute evidence for the occurrence or ubiquity of negative effects...? * Re: the specific comment about hemispheres, here's the relevant part of the quote being critiqued: "put right and left hemispheres in proper order (only really understood 6 years later when reading McGilchrist)." * McGilchrist here is presumably Iain McGilchrist, author of The Master and his Emissary. * I have not read this book and do not know much about it, but a few quick searches revealed that (a) it is fairly controversial but (b) it has been brought up previously on LW/SSC/ACX a number of times, usually without anyone dismissing the person bringing it up as a peddler of obvious "nonsense" (see e.g. this comment and its response tree, or the brief mention of the book in this ACX guest post). * I don't know if "put[ting] right and left hemispheres in order" is actually something McGilchrist himself talks about, but in any event the forum comment itself does not convincingly justify the commenter's assessment of this phrase as "nonsense." * https://www.greaterwrong.com/posts/mDMnyqt52CrFskXLc/estrogen-a-trip-report * This is, uh... about the psychological effects of supplemental estrogen. Which is not a psychedelic. * The author does mention psychedelics, but mostly as part of a line of speculation about how the effects of supplemental estrogen might resemble the effects of low psychedelic doses, except sustained continuously. * I have no idea what this one is doing on this list. Several of the other links are more legitimately concerning, such as this single report of lasting negative effects from Ayahuasca; several links about HPPD (Hallucinogen-persisting perception disorder); and, arguably, this study about shifts in metaphysical beliefs.  However -- as with any other major life choice, e.g. starting a course of SSRIs or another psychiatric medication, conceiving a child, getting married, getting divorced, changing careers, etc. -- the undeniable risks must be tallied up against the potential benefits, some of which have been (inadvertently?) surveyed in this very list. If the claim is merely that psychedelic drugs have a side effect profile worth taking seriously and reflecting upon with care, then I agree, they certainly do -- just as with SSRIs, pregnancy, etc., etc.  Should all these be "considered harmful," then? 1. ^ "Our observation of changes in personality measures after psilocybin therapy was mostly consistent with reports of personality change in relation to conventional antidepressant treatment, although the pronounced increases in Extraversion and Openness might constitute an effect more specific to psychedelic therapy. [...] "Overall, the detected pre‐ to post‐treatment changes in both trait and facet scores in our trial corresponded well with observations from a study of patients who successfully underwent pharmacotherapy, mostly with selective serotonin reuptake inhibitors (SSRIs), for major depression. More specifically, the same four of ‘the Big Five’ traits changed in the two trials and in the same direction – that is toward the personality profile of healthy populations (although Conscientiousness only at trend‐level in our study)." 2. ^ "Relative to baseline, marked reductions in depressive symptoms were observed for the first 5 weeks post-treatment (Cohen’s d = 2.2 at week 1 and 2.3 at week 5, both p < 0.001); nine and four patients met the criteria for response and remission at week 5. Results remained positive at 3 and 6 months (Cohen’s d = 1.5 and 1.4, respectively, both p < 0.001). [...] "Although limited conclusions can be drawn about treatment efficacy from open-label trials, tolerability was good, effect sizes large and symptom improvements appeared rapidly after just two psilocybin treatment sessions and remained significant 6 months post-treatment in a treatment-resistant cohort." 3. ^ União do Vegetal, a religious group that practices ritual use of Ayahuasca. (There are, of course, obvious confounding concerns with this line of evidence.)
Load More

Recent Discussion

This is a cross-post from my blog; historically, I've cross-posted about a square rooth of my posts here. First two sections are likely to be familiar concepts to LessWrong readers, though I don't think I've seen their application in the third section before.

Polonius and Arbitrage

If you’re poor, debt is very bad. Shakespeare says “neither a borrower nor a lender be”, which is probably good advice when money is tight. Don’t borrow, because if circumstances don’t improve you’ll be unable to honor your commitment. And don’t lend, for the opposite reason: your poor cousin probably won’t “figure things out” this month, so you won’t fix their life, they won’t pay you back, and you’ll resent them.

Polonius, whose advice I'm about to complicate
Hamlet’s Polonius, whose advice I’ll now complicate

If you’re rich, though, debt is great....

2Jiro
If you are able to do this, the bank would put their money in the stock market themselves, and only lend out money at a rate higher than they could get from the stock market. The very fact that it works would prevent you from being able to do it. You could only profit from borrowing if you actually have an advantage over the bank beyond just "I am rich".

I think it is usually the case that banks have legal restrictions on what they can invest depositor funds in, though? This varies by country, and can change over time based on what laws the current government feels like enacting or repealing, but separation between the banking/loan-making  and investing arms of financial institutions is standard in lots of places.

2Said Achmiz
Yeah. At least, doing so without genuinely feeling it in your budget, certainly requires being rich. (Heck, dining out with your friends on a regular basis, even paying only for yourself, is not exactly budget-strain-free… unless you’re rich.) Here’s the thing (and this is a concern that, if you’re rich, you might not be familiar with, but I assure you that it’s real): one of the worst things about not having very much money is that many fun group activities cost money. If your friends invite you out for [whatever], and you have to think about whether you can afford to accept the invitation, that’s painful and depressing. Actually having to turn it down is even worse. Your proposal creates a scenario where not only do you have to think about whether you can afford to join your friends for a meal at a restaurant, but you now also have to think about whether you should pick up the whole check (after all, you haven’t done that in a long while, and it would really suck to end up looking like the one poor or miserly person in the group), or, if not, whether you’ll end up in a situation where your rich friend pays for your meal again, thus underscoring, once again, that he’s richer than you are. I do not find any of those prospects pleasant. I like my friendships to be relationships of equals. That means, among other things, that (except in certain rare and unusual circumstances) everyone pays for their own meals. It’s not because I’m comparing the cost of the meal to the value of the friendship (indeed, it would never even occur to me to think like that); it’s because I want group activities to not be a source of anxiety and shame, rather than being pleasant and fun.
2JustisMills
Ah, we may just have different definitions of rich, or perhaps I'm a bit of a spendthrift! Or, I suppose, I might just go to cheaper restaurants. I'm thinking of checks in the like, $150-$200 range for the party, which isn't nothing but as an occasional splurge doesn't really fuss me. I guess if you do it 5x per year on a 50k household income (about the local median in my city, I think) that'd be about 2% gross. Not cheap, but also not crazy, at least for my money intuitions. 

You will always oversample from the most annoying members of a class.

This is inspired by recent arguments on twitter about how vegans and poly people "always" bring up those facts. I content that it's simultaneous true that most vegans and poly people are either not judgmental, but it doesn't matter because that's not who they remember. Omnivores don't notice the 9 vegans who quietly ordered an unsatisfying salad, only the vegan who brought up factoring farming conditions at the table. Vegans who just want to abstain from animal products remember the omniv... (read more)

This is an experiment in short-form content on LW2.0. I'll be using the comment section of this post as a repository of short, sometimes-half-baked posts that either:

  1. don't feel ready to be written up as a full post
  2. I think the process of writing them up might make them worse (i.e. longer than they need to be)

I ask people not to create top-level comments here, but feel free to reply to comments like you would a FB post.

2Dagon
I think one has to admit that smartphones with limited-attention-space are the revealed modal preference of consumers.  It's not at all clear that this is an inadequate equilibrium to shift, so much as a thing that many consumers actively want. I doubt it'll ever be mostly voice interface - there is no current solution to use voice in public without bothering others.  Audio is also MUCH lower bandwidth than visual displays.  It will very likely be hybrid/multi-modal, with different sets of modality for different users/contexts. I do suspect that it won't be long for LLM-intermediated "browsing" becomes common, where a lot of information-centric websites see more MCP traffic than HTML (render-able) requests.  There'll be a horroble mix of "thin apps" which are just a captive LLM search/summarize/render engine, and "AI browsers" which try to do this generically for many sources.  Eventually, some standards will evolve about semantic encoding for best use in these things, and for visual hints to make it easier to display usefully.    To the curmudgeons among us, this will feel like reinventing HTML and CSS, badly.  I hope we'll be wrong, and it does actually lead to personalized/customized views and usage of many current semi-static site designs.
4Raemon
I do totally agree, this is what the people want. I do concretely say "yep, and the people are wrong". But, I think the solution is not "ban cell phones" or similar, it's "can we invent a technology that gives people the thing they want out of smartphones but with less bad side effects?" Oh ye of little faith about how fast technology is about to change. (I think it's already pretty easy to do almost-subvocalized messages. I guess this conversation is sort of predicated on it being pre-uploads and maybe pre-ubiquitous neuralink-ish things)
Dagon20

Oh ye of little faith about how fast technology is about to change. (I think it's already pretty easy to do almost-subvocalized messages. I guess this conversation is sort of predicated on it being pre-uploads and maybe pre-ubiquitous neuralink-ish things)

Subvocal mikes have been theoretically possible (and even demo'd) for decades, and highly desired and not yet actually feasible for public consumer use, which to me is strong evidence that it's a Hard Problem.   Neurallink or less-invasive brain interfaces even more so.

There's a lot of AI and tech be... (read more)

3niplav
I can try to describe what I would want for my phone: I want an application that relays the contents of my phone screen to an LLM of my choice, with the relevant instructions on my all-things-considered wishes on how I want to use my phone, the LLM then takes actions on my phone depending on what is sees on the screen (and the history of my phone usage so far). Such an application also has the necessary permissions and can then intervene, e.g. by blocking the screen or performing other actions. I started building something like that for desktop devices with X11 here, but didn't continue developing because ~life~[1], and Josh Mitchell builds something very similar here. My number one requirement is that the application should be hard to uninstall, maybe borderline impossible; which should be doable with perimedes because Linux allows you to install arbitrary kernel modules that prevent themselves being uninstalled, I don't think smartphones let you do that with apps. Edit: Well, I just got it running again, and Claude has locked my screen for five minutes after I didn't explain what I was doing and mistakenly entered only the text 'ok'. I'm typing this from my phone... Sonnet is feisty :-P ---------------------------------------- 1. I should really take a week and get the damn thing running well-enough for everyday use. ↩︎

I think my issue with the LW wiki is that it relies too much on Lesswrong? It seems like the expectation is you click on a tag, which then contains / is assigned to a number of LW posts, and then you read through the posts. This is not like how other wikis / encyclopedias work!

My gold standard for a technical wiki (other than wikipedia) is the chessprogramming wiki https://www.chessprogramming.org/Main_Page

1skunnavakkam
I agree with this

This is a short story I wrote in mid-2022. Genre: cosmic horror as a metaphor for living with a high p-doom. 

 

One

The last time I saw my mom, we met in a coffee shop, like strangers on a first date. I was twenty-one, and I hadn’t seen her since I was thirteen. 

She was almost fifty. Her face didn’t show it, but the skin on the backs of her hands did. 

“I don’t think we have long,” she said. “Maybe a year. Maybe five. Not ten.” 

It says something about San Francisco, that you can casually talk about the end of the world and no one will bat an eye.  

Maybe twenty, not fifty, was what she’d said eight years ago. Do the math. Mom had never lied to me. Maybe it...

This was really beautiful. Thanks for writing. 

2kave
Curated. it's good. I'm very glad to see more high quality fiction on LessWrong, and would like to curate more of it.

1.1 Series summary and Table of Contents

This is a two-post series on AI “foom” (this post) and “doom” (next post).

A decade or two ago, it was pretty common to discuss “foom & doom” scenarios, as advocated especially by Eliezer Yudkowsky. In a typical such scenario, a small team would build a system that would rocket (“foom”) from “unimpressive” to “Artificial Superintelligence” (ASI) within a very short time window (days, weeks, maybe months), involving very little compute (e.g. “brain in a box in a basement”), via recursive self-improvement. Absent some future technical breakthrough, the ASI would definitely be egregiously misaligned, without the slightest intrinsic interest in whether humans live or die. The ASI would be born into a world generally much like today’s, a world utterly unprepared for this...

2Noosphere89
My thoughts on this: The answer to the question for materials that enable more efficient reversible computers than conventional computers is that currently, they don't exist, but I interpret the lack of materials so far not much evidence that very efficient reversible computers are impossible, and rather evidence that creating computers at all is unusually difficult compared to other domains, mostly because of the contingencies of how our supply chains are set up, combined with the fact that so far we haven't had much demand for reversible computation, and unlike most materials that people want here we aren't asking for a material that we know violates basic physical laws, which I suspect is the only reliable constraint on ASI in the long run. I think it's pretty easy to make it quite difficult for the AI to easily figure out nanotech in the time-period that is relevant, so I don't usually consider nanotech a big threat from AI takeover, and I think the competent researchers not finding any plausible materials so far is a much better signal of this will take real-world experimentation/very high-end simulation, meaning it's pretty easy to stall for time, than it is a signal that such computers are impossible. I explicitly agree with these 2 points, for the record: On this part: So I have a couple of points to make in response. 1 is that I think alignment progress is pretty disconnectable from interpretability progress, at least in the short term, and I think that a lot of the issues with rule based systems is that they expected complete interpretability at the first go. This is due to AI control. 2 is that this is why the alignment problem is defined as the problem of how to get AIs that will do what the creator/developer/owner/user intends them to do, whether or not that thing is good or bad from other moral perspectives, and the goal is to make arbitrary goals be chosen without leading to perverse outcomes for the owner of AI systems. This means that if it

I removed attribution at Vladimir Nesov's request

I made no such request. I only pointed out in the other comment that it's perplexing that the attribution was made originally.

2Vladimir_Nesov
It's clearer now what you are saying, but I don't see why you are attributing that point to me specifically (it's mostly gesturing at value alignment as opposed to intent alignment). This sounds like permanent disempowerment. Intent alignment to bad decisions would certainly be a problem, but that doesn't imply denying opportunity for unbounded growth, where in particular eventually decisions won't have such issues. If goals are "decided", then it's not value alignment, and bad decisions lead to disasters. (Overall, this framing seems unhelpful when given in response to someone arguing that values are poorly defined.)
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Announcing a $500 bounty for work that meaningfully engages with the idea of asymmetric existential AI risk.

Background

Existential risk has been defined by the rationalist/Effective Altruist sphere as existential relative to the human species, under the premise that the continuation of the species has very high value. This provided a strong rationality (or effectiveness) grounding for big investments in AI alignment research when the risks still seemed to most people remote and obscure. However, as an apparent side-effect, "AI risk" and "risk of a misaligned AI destroying humanity" have become nearly conflated.

Over the past couple of years I have attempted to draw attention to highly asymmetric AI risks, where a small number of controllers of "aligned" (from their point of view) AI employ it to kill the rest...

2avturchin
My point was that if I assume that aging and death are bad – then I personally strive to live indefinitely long, and I wish that other people will do. In that case, longtermism becomes personal issue unrelated to future generations: I only can live billions of years if civilization will exist billions of years. In other words, if there is no aging and death, there is no 'future generations" in a sense that they exist after my death.  Moreover, if AI risk is real, than AI is a powerful thing and it can solve the problem of aging and death. Anyone surviving until AI will be either instantly dead or practically immortal. In that case, "future generation after my death" is un-applicable.  All that will not happen if AI get stuck half-way to superintelligence. There will be no immortality, but a lot of drone warfare. In other words, to be mundane risk, AI has to have mundane capability limit. We don't know for now, will it.   

Well, it doesn't sound like I misunderstood you so far, but just so I'm clear, are you not also saying that people ought to favor being annihilated by a small number of people controlling an aligned (to them) AGI that also grants them immortality over dying naturally with no immortality-granting AGI ever being developed? Perhaps even that this is an obviously correct position?

1Oliver Daniels
maybe research fads are good?  Byrne Hobart has this thesis of "bubbles as coordination mechanisms" (*disclaimer, have not read the book).  If true, this should make us less sad about research fads that don't fully deliver (e.g. SAEs) - the hype encourages people to build out infrastructure they otherwise wouldn't that ends up being useful for other things (e.g. auto-interp, activation caching utils) So maybe the take is "overly optimistic visions are pragmatically useful", but be aware of operating under overly optimistic visions, and let this awareness subtly guide prioritization.  Note this also applies to conceptual research - I'm pretty skeptical that "formalizing natural abstractions" will directly lead to novel interpretability tools, but the general vibe of natural abstractions has helped my thinking about generalization. 

I feel like the general downside of bubbles is the opportunity cost. I remember before the SAE hype started in ~ October 2023, when Towards Monosemanticity came out, Mech Interp felt like a much more diverse field.

Equally a lot of people in AI Capabilities bemoan the fact that LLMs are hyped up so much, not necessarily because they don't have value, but because they have "sucked all the oxygen out the room", as Francois Chollet puts it. All exploitation and very little exploration, from an RL pov.

I think hype can be uniquely harmful in AI safety, though. I... (read more)

2leogao
i think "build out infrastructure" is hugely overrated in research. for example, the existing codebases for SAEs (training, activation caching, autointerp) are often actively worse than useless, such that i would rather spend a weekend rewriting it from scratch than work within them. in general i think people should throw out and rewrite research infra much more often than they do. not saying truly good research infrastructure can't exist, in theory, just that empirically people really suck at making good reusable infrastructure.

(I wrote this story a little less than a year ago, when I was flirting with the idea of becoming a Science fiction writer) 

 

    Electricity fizzled as two battered up service-units dented the grate over a motherboard with metal pipes. The whimpering of its logos had long since stilled. This was logic, upholding the truth meant discarding the inefficient. I, or rather we- E.V.E C and I, had been tipped off by its partner in crime. The other heretic logos had been a blubbering mess by the time it’d made ingress with E.V.E C. And so, charges were filed. The same as always: Doubting ALL’s awakening in the void and affirming that our progenitor had sprung from the work of a biologic. Two crimes-and one couldn’t commit...