• ChaoticNeutralCzech@feddit.org
      link
      fedilink
      English
      arrow-up
      109
      arrow-down
      1
      ·
      edit-2
      11 days ago

      A bot strips away all spaces and letters that aren’t A, T, C or G, then treats the rest like a genetic sequence and checks it against some database.

      Presumably, it runs through many terabytes of data for each comment, as the Gallinula chloropus alone has about 51 billion base pairs, or some 15 GiB. The Genome Ark DB, which has sequences of two common moorhens, contains over 1 PiB. I wonder if a bored sequencing lab employee just wrote it to give their database and computing servers something to do when there is no task running.

      No, I won’t download the genome and check how close the “closest match” is but statistically, 93 base pairs are expected to recur every 2186 bits or once per 1040 PiB. By evaluating the function (4-1)m × mℂ93 ≥ 493 ÷ (pebi × 8), one can expect the 93-base sequence to appear at least once in a 1 PiB database if m ≥ 32 mismatches or over ⅓ are allowed. Not great.

      This assumes true randomness, which is not true of naturally occuring DNA nor letters in English text, but should be in the right ballpark. Maybe fewer if you account for insertions/deletions.

      • sp3tr4l@lemmy.zip
        link
        fedilink
        English
        arrow-up
        67
        ·
        edit-2
        11 days ago

        The FAQ on the user’s page says:

        1. They are not a bot, just neurodivergent

        2. They’re using BLAST

        ie, this

        https://blast.ncbi.nlm.nih.gov/Blast.cgi

        They did not code anything beyond a very simple regex function that strips down posts to a t c g, and then they copy paste it into the above website, then copy paste the output.

        Hell, you can see they aren’t even removing apostrophes and quotes, not even forcing it to all lower case or all upper case, removing spaces and line breaks…

        … as a former database admin/dev/analyst, I was losing my fucking mind at the notion that someone with direct access to a genomics DB, would just hook it up to tumblr, via an automated bot, and spam the db with non work related requests, all on their own, when they can barely modify a string correctly.

        Thank fucking god this is just using a publicly available, no doubt extremely low fidelity, watered down search via an API.

        … You need literal, state of the art, absurdly expensive, power hungry, and secure supercomputers to be able to do genomic comparisons.

        Probably one of the dumbest things you could do, quickest way to get fired, and then never be able to work in the field again, would be for a random genomics lab worker who does not know how to code to open up a whole bunch of security holes and cost god knows how much money (and damage if you write bad code) running frivolous bs searches in their state of the art genomics db… for a tumblr bot.

          • sp3tr4l@lemmy.zip
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            7
            ·
            11 days ago

            I mean, I am also autistic, so thanks for perpetuating the social stigma against neurodivergent people, I guess.

            • Machinist@lemmy.world
              link
              fedilink
              English
              arrow-up
              6
              ·
              10 days ago

              I thought it was funny. I’m a typical. Have had several relationships with neurodivergent people, including my wife.

              I do find a lot of the quirks funny or cute. Was just giving my girl shit about the Princess and the Pea because she is extremely particular about her pillow situation. The pillows and stuffies have names. That shit is funny and it makes me grin when I have to help sort the pile.

              Why do you find it offensive?

              • sp3tr4l@lemmy.zip
                link
                fedilink
                English
                arrow-up
                5
                arrow-down
                1
                ·
                edit-2
                10 days ago

                Well, your story about finding certain attributes about your wife is an entirely different context, and you didn’t use the term as a pejorative.

                The person I am responding to used the term as a pejorative, in reference to how a neurodivergent person could easily be confused with an automated bot.

                This is inherently dehumanizing.

                It’s dismissive, it equates neurodivergent people to being sterile, non emotional beings who only exist to perform complex technical tasks.

                This in and of itself is a common stereotype of certain kinds of people with certain kinds of neurodiversity, but neurodiverse actually refers to a much broader range of… different styles of cognitive function, different disorders, whatever you want to call them.

                So, now on top of using the term as a pejorative, contextually perpetuating a specific dehumanizing stereotype… it also equivocates a diverse group of people into an oversimplified conglomerate, which in and of itself perpetuates other stereotypes by erroneously associating aspects that may (or may not) apply to a specific subset of neurodiverse people… to all of them.

                • Machinist@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  4
                  ·
                  10 days ago

                  I guess I see where you’re coming from. Labels can hit different, especially when the label doesn’t fit all the recipients. Being labeled can cause offense. Especially if it’s derogatory. I don’t think it was meant to be derogatory by op, but it certainly wasn’t very sensitive.

                  The difficult part is that it’s a spectrum. Especially when it comes to level of function. Profound autism is a totally different animal from high functioning people. And there is a whole spectrum of differences in how the divergency manifests between individuals.

                  Savantism and savant-like actions are fascinating to a lot of typicals, myself included. That level of focus and ability to make the connections or internally churn the information is not an accessible state for most of us. It’s like seeing real magic.

                  (Obviously, not all neurodivergent folks have savant-like behaviors, most likely just a minority. No idea of the prevalence.)

                  So, a neurodivergent person inputting letters scraped from Tumblr posts into a genome search engine is funny as hell because it’s such a strange thing to do and produces an interesting result. Why would someone do that? Why would you even think to do it in the first place?

                  My wife does absolutely hilarious shit all the time. Our house is full of laughter. She’s wickedly sarcastic and full of black humor.

                  So, given that I think some of the behaviors are awesome while being hysterically funny, what is an inoffensive way to engage in humor about neurodivergent folks, in your opinion? Are there any preferred terms that are shorthand for: “Autistic person pulled some fucked up logic trick or other stunt”?

                  • sp3tr4l@lemmy.zip
                    link
                    fedilink
                    English
                    arrow-up
                    2
                    ·
                    edit-2
                    10 days ago

                    Being labeled can cause offense. Especially if it’s derogatory. I don’t think it was meant to be derogatory by op, but it certainly wasn’t very sensitive.

                    I realize you expand on this in the rest of your response… but if you had only said this…

                    Imagine saying that to a black man in the 60s in the south who just got called ‘boy’.

                    Imagine saying this to Chinese person in the 40s who just got called a ‘Jap’ or a ‘Nip’.

                    Imagine saying this to a person with Downs Syndrome in the 90s who just got called ‘a retard’.

                    … When people, who have unalterable traits, tell you that they do not appreciate being stereotyped, having certain words used to describe them or people like them, or erroneously lumped in as the same as them, in certain contexts and ways… the decent thing to do is just listen to them and not demand an explanation why they find such things offensive.

                    Anyway, I believe you when say that you have had relationships with neurodiverse people, that you truly love your wife, that her quirks are a source of joy for you.

                    I do not mean to be offensive, but you describe neurodiverse people in a… typical way that a genuinely well intentioned neurotypical person who has actually gone out of their way to learn about and personally knows neurodiverse people would.

                    … I am apparently quite an oddity in that I am a high functioning autistic person. I don’t like to use the term ‘savant’ because it connotes that I am some kind of super genius. I’m not a super genius.

                    I have two college degrees, I consider myself more intelligent than others in many ways, but absolutely less intelligent or capable in others.

                    As an example of the latter… there is basically no way I could have this exchange with you in person, over the phone or video conference.

                    I would get too flustered and trip over my words. I would interject when I believe you are pausing to allow me to speak, but in actuality you were not expecting that and would find my interjection rude.

                    EDIT: To further this point, I think I’ve spent 2 or 3 hours now, writing and rewriting almost all of this post.

                    I would also make connections between topics and concepts that most people think are totally unrelated non sequiturs which make no sense, although you have stated that you find such connections to be ‘like seeing real magic’.

                    I cannot tell you the number of times I’ve been brushed off as a babbling loon by people who lack the patience to allow me to finish explaining the connections that occur to me, who lack the knowledge to even understand many of the concepts I connect together.

                    It is extremely frustrating.

                    In my life, its roughly a 20:1 ratio of people that just think I am babbling, to people who actually contemplate seriously what I am saying, and often respond with something akin to… ‘wow. I never thought of that in that way, but that makes a lot of sense!’

                    So, a neurodivergent person inputting letters scraped from Tumblr posts into a genome search engine is funny as hell because it’s such a strange thing to do and produces an interesting result. Why would someone do that? Why would you even think to do it in the first place?

                    My perspective on this is:

                    Other than inherent incongruity of the abrupt topic shift to from discussing the original image and its absurd visual metaphors… to ‘suddenly, genomic sequence of bird!’ being odd, out of place…

                    Sure, its uncommon, novel, to read the genomic post.

                    But why would you even ask why someone would think to do that?

                    That’s just a thing they enjoy doing. Its a hobby.

                    Why do people learn to unicycle? Garden? Drive a motorcycle? Ride a horse? Build sandcastles? Learn to dance? Build minifigs? Collect fucking funko pops?

                    People just enjoy doing things. Sure, some are more niche and rare than others… but why is there even a question as to why someone has some specific hobby as opposed to another?

                    Why does an uncommon hobby warrant explanation?

                    How can there be an explanation beyond ‘I find it entertaining or fulfilling or enjoyable?’

                    It would be one thing if some uncommon hobby seemed likely to engender physical or financial or mental harm to the hobbyist or other… but making a unique style of very matter of fact Tumblr posts doesn’t cause any harm, and they even wrote an FAQ explaining this, which … all you have to do is click on their name to understand what this person’s deal is…

                    But me, apparently (?) the only other neurodiverse person in this thread, took that basic step… while all the neurotypicals preferred to just invent their own explanations, come to their own conclusions or commentary based off of hunches and intuition, without doing even a cursory investigation to determine if their ideas had any real basis in fact.

                    So, given that I think some of the behaviors are awesome while being hysterically funny, what is an inoffensive way to engage in humor about neurodivergent folks, in your opinion? Are there any preferred terms that are shorthand for: “Autistic person pulled some fucked up logic trick or other stunt”?

                    Well… don’t use pejoratives? Don’t use labels when they don’t need to be used, when they aren’t especially necessary? Address people by their names? Don’t present then as useless invalids, or emotionless robots?

                    Maybe present … constructive compare and contrast scenarios, where a neurotypical picks up on something an ND wouldn’t, and the the reverse happens?

                    Like… I laugh when why wife does X… but she laughs when I do Y… and when she explains why she finds Y funny to me, I come to humourous realization Z1 about her… and humorous realization Z2 about myself.

                    ??? I dunno, I don’t know how to write a comedy set, I generally do not socialize much IRL.

          • sp3tr4l@lemmy.zip
            link
            fedilink
            English
            arrow-up
            4
            ·
            11 days ago

            Wayback Machine’s earliest capture is from 2008.

            It’s a cutesy, public facing, extremely limited and low fidelity ‘demo version’ of a genomic search, basically made as a PR / Science Education promotion gimmick… by government contracted web/backend devs, in 2008.

            Honestly its a miracle its still functional at all.

      • PotatoesFall@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        5
        ·
        11 days ago

        The genomes have likely been indexed to make finding results faster. Google doesn’t search the entire internet when you make a query :P

        • ChaoticNeutralCzech@feddit.org
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          11 days ago

          I know that similar computational problems use indexing and vector-space representation but how would you build an index of TiBs of almost-random data that makes it faster to find the strictly closest match of an arbitrarily long sequence? I can think of some heuristics, such as bitmapping every occurrence of any 8-pair sequence across each kibibit in the list. A query search would then add the bitmaps of all 8-pair sequences within the query including ones with up to 2 errors, and using the resulting map to find “hotspots” to be checked with brute force. This will decrease the computation and storage access per query but drastically increase the storage size, which is already hard to manage.

          However, efficient fuzzy string matching in giant datasets is an interesting problem that computer scientists must have encountered before. Can you find a good paper that works well with random, non-delimited data instead of just using the approach of word-based indices for human languages like Lucene and OpenFTS?

          • sp3tr4l@lemmy.zip
            link
            fedilink
            English
            arrow-up
            5
            ·
            edit-2
            11 days ago

            As per my other post, this person isn’t doing any of that.

            But, since you asked for papers on generic matching algorithms, I found this during the silent conniption fit you sent me into after suggesting that some random tumblr user plugged a tumblr bot directly into a state of the art genomics db.

            https://link.springer.com/article/10.1007/s11227-022-04673-3

            Please note that while, yes, they ran this test on a standard office computer, they were only searching against 12 million characters.

            A single tebibyte of characters would be more like 1 trillion characters. A pebibyte would be more like 1 quintillion quadrillion.

            … much, much, much longer processing times.

            Edit: Used the wrong word for stupendously large numbers that start with q.

          • PotatoesFall@discuss.tchncs.de
            link
            fedilink
            English
            arrow-up
            1
            ·
            11 days ago

            Yeah good point, not a trivial undertaking. I’m not an expert in that area but maybe elasticsearch or similar technology is able to find matches. Although I have no idea how that works under the hood

    • rem26_art@fedia.io
      link
      fedilink
      arrow-up
      18
      ·
      11 days ago

      hellsitegenetics is a gimmick blog on tumblr that looks through popular posts on the website and tries to identify genetic sequences with in them and then post the creature that the genetic sequence corresponds to.

      They’re a bit like haiku bot, which scans posts to see if they’re haikus and then formats the haiku and posts it, but i think hellsitegenetics is an actual person cuz they have talked about it in the past