• tempest@lemmy.ca
      link
      fedilink
      English
      arrow-up
      24
      arrow-down
      1
      ·
      1 day ago

      CloudFlare has become an Internet protection racket and I’m not happy about it.

      • Laser@feddit.org
        link
        fedilink
        English
        arrow-up
        18
        ·
        23 hours ago

        It’s been this from the very beginning. But they don’t fit the definition of a protection racket as they’re not the ones attacking you if you don’t pay up. So they’re more like a security company that has no competitors due to the needed investment to operate.

        • A1kmm@lemmy.amxl.com
          link
          fedilink
          English
          arrow-up
          3
          ·
          9 hours ago

          Cloudflare are notorious for shielding cybercrime sites. You can’t even complain about abuse of Cloudflare about them, they’ll just forward on your abuse complaint to the likely dodgy host of the cybercrime site. They don’t even have a channel to complain to them about network abuse of their DNS services.

          So they certainly are an enabler of the cybercriminals they purport to protect people from.

          • MithranArkanere@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            7 hours ago

            Any internet service provider needs to be completely neutral. Not only in their actions, but also in their liability.
            Same goes for other services like payment processors.
            If companies that provide content-agnostic services are allowed to policy the content, that opens the door to really nasty stuff.

            You can’t chop everyone’s arms to stop a few people from stealing.

            If they think their services are being used in a reprehensible manner, what they need to do is alert the authorities, not act like vigilantes.

          • Laser@feddit.org
            link
            fedilink
            English
            arrow-up
            1
            ·
            8 hours ago

            If they acted differently, they’d probably be liable for illegal activity that they proxy for (this is for example relevant for the DMCA safe harbor).

            Anyhow, when on their abuse page, I have an option for “Registrar”, which is used for “DNS abuse”, among others.

  • Amberskin@europe.pub
    link
    fedilink
    English
    arrow-up
    68
    ·
    1 day ago

    Uh, are they admitting they are trying to circumvent technological protections setup to restrict access to a system?

    Isn’t that a literal computer crime?

    • sunbeam60@lemmy.ml
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      1
      ·
      22 hours ago

      They’re not. They’re using this as an excuse to become paid gatekeepers of the internet as we know it. All that’s happening is that Cloudflare is using this to menuever into position where they can say “nice traffic you’ve got there - would be a shame if something happened to it”.

      AI companies are crap.

      What Cloudflare is doing here is also crap.

      And we’re cheering it on.

  • kreskin@lemmy.world
    link
    fedilink
    English
    arrow-up
    15
    arrow-down
    3
    ·
    edit-2
    1 day ago

    they cant get their ai to check a box that says “I am not a robot”? I’d think thatd be a first year comp sci student level task. And robots.txt files were basically always voluntary compliance anyway.

    • Dr. Moose@lemmy.world
      link
      fedilink
      English
      arrow-up
      17
      arrow-down
      1
      ·
      1 day ago

      Cloudflare actually fully fingerprints your browser and even sells that data. Thats your IP, TLS, operating system, full browser environment, installed extensions, GPU capabilities etc. It’s all tracked before the box even shows up, in fact the box is there to give the runtime more time to fingerprint you.

      • tempest@lemmy.ca
        link
        fedilink
        English
        arrow-up
        6
        arrow-down
        1
        ·
        1 day ago

        Yeah and the worst part is it doesn’t fucking work for the one thing it’s supposed to do.

        The only thing it does is stop the stupidest low effort scrapers and forces the good ones to use a browser.

  • Kissaki@feddit.org
    link
    fedilink
    English
    arrow-up
    104
    arrow-down
    1
    ·
    edit-2
    2 days ago

    Perplexity argues that a platform’s inability to differentiate between helpful AI assistants and harmful bots causes misclassification of legitimate web traffic.

    So, I assume Perplexity uses appropriate identifiable user-agent headers, to allow hosters to decide whether to serve them one way or another?

    • ubergeek@lemmy.today
      link
      fedilink
      English
      arrow-up
      9
      ·
      1 day ago

      And I’m assuming if the robots.txt state their UserAgent isn’t allowed to crawl, it obeys it, right? :P

      • Kissaki@feddit.org
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 day ago

        No, as per the article, their argumentation is that they are not web crawlers generating an index, they are user-action-triggered agents working live for the user.

        • ubergeek@lemmy.today
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 day ago

          Except, it’s not a live user hitting 10 sights all the same time, trying to crawl the entire site… Live users cannot do that.

          That said, if my robots.txt forbids them from hitting my site, as a proxy, they obey that, right?

    • lime!@feddit.nu
      cake
      link
      fedilink
      English
      arrow-up
      36
      ·
      2 days ago

      yeah it’s almost like there as already a system for this in place

    • Dr. Moose@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      5
      ·
      1 day ago

      Its not up to the hoster to decide whom to serve content. Web is intended to be user agent agnostic.

  • Glitchvid@lemmy.world
    link
    fedilink
    English
    arrow-up
    256
    arrow-down
    8
    ·
    2 days ago

    When a firm outright admits to bypassing or trying to bypass measures taken to keep them out, you think that would be a slam dunk case of unauthorized access under the CFAA with felony enhancements.

    • GamingChairModel@lemmy.world
      link
      fedilink
      English
      arrow-up
      116
      arrow-down
      16
      ·
      2 days ago

      Fuck that. I don’t need prosecutors and the courts to rule that accessing publicly available information in a way that the website owner doesn’t want is literally a crime. That logic would extend to ad blockers and editing HTML/js in an “inspect element” tag.

      • Encrypt-Keeper@lemmy.world
        link
        fedilink
        English
        arrow-up
        63
        arrow-down
        6
        ·
        2 days ago

        That logic would not extend to ad blockers, as the point of concern is gaining unauthorized access to a computer system or asset. Blocking ads would not be considered gaining unauthorized access to anything. In fact it would be the opposite of that.

        • GamingChairModel@lemmy.world
          link
          fedilink
          English
          arrow-up
          21
          arrow-down
          6
          ·
          2 days ago

          gaining unauthorized access to a computer system

          And my point is that defining “unauthorized” to include visitors using unauthorized tools/methods to access a publicly visible resource would be a policy disaster.

          If I put a banner on my site that says “by visiting my site you agree not to modify the scripts or ads displayed on the site,” does that make my visit with an ad blocker “unauthorized” under the CFAA? I think the answer should obviously be “no,” and that the way to define “authorization” is whether the website puts up some kind of login/authentication mechanism to block or allow specific users, not to put a simple request to the visiting public to please respect the rules of the site.

          To me, a robots.txt is more like a friendly request to unauthenticated visitors than it is a technical implementation of some kind of authentication mechanism.

          Scraping isn’t hacking. I agree with the Third Circuit and the EFF: If the website owner makes a resource available to visitors without authentication, then accessing those resources isn’t a crime, even if the website owner didn’t intend for site visitors to use that specific method.

          • finitebanjo@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            9 hours ago

            Site owners currently do and should have the freedom to decide who is and is not allowed to access the data, and to decide for what purpose it gets used for. Idgaf if you think scraping is malicious or not, it is and should be illegal to violate clear and obvious barriers against them at the cost of the owners and unsanctioned profit of the scrapers off of the work of the site owners.

          • Glitchvid@lemmy.world
            link
            fedilink
            English
            arrow-up
            19
            arrow-down
            2
            ·
            edit-2
            2 days ago

            When sites put challenges like Anubis or other measures to authenticate that the viewer isn’t a robot, and scrapers then employ measures to thwart that authentication (via spoofing or other means) I think that’s a reasonable violation of the CFAA in spirit — especially since these mass scraping activities are getting attention for the damage they are causing to site operators (another factor in the CFAA, and one that would promote this to felony activity.)

            The fact is these laws are already on the books, we may as well utilize them to shut down this objectively harmful activity AI scrapers are doing.

            • ubergeek@lemmy.today
              link
              fedilink
              English
              arrow-up
              9
              arrow-down
              1
              ·
              2 days ago

              The fact is these laws are already on the books, we may as well utilize them to shut down this objectively harmful activity AI scrapers are doing.

              Silly plebe! Those laws are there to target the working class, not to be used against corporations. See: Copyright.

            • tomalley8342@lemmy.world
              link
              fedilink
              English
              arrow-up
              8
              arrow-down
              1
              ·
              2 days ago

              Nah, that would also mean using Newpipe, YoutubeDL, Revanced, and Tachiyomi would be a crime, and it would only take the re-introduction of WEI to extend that criminalization to the rest of the web ecosystem. It would be extremely shortsighted and foolish of me to cheer on the criminalization of user spoofing and browser automation because of this.

              • Glitchvid@lemmy.world
                link
                fedilink
                English
                arrow-up
                4
                ·
                edit-2
                2 days ago

                Do you think DoS/DDoS activities should be criminal?

                If you’re a site operator and the mass AI scraping is genuinely causing operational problems (not hard to imagine, I’ve seen what it does to my hosted repositories pages) should there be recourse? Especially if you’re actively trying to prevent that activity (revoking consent in cookies, authorization captchas).

                In general I think the idea of “your right to swing your fists ends at my face” applies reasonably well here — these AI scraping companies are giving lots of admins bloody noses and need to be held accountable.

                I really am amenable to arguments wrt the right to an open web, but look at how many sites are hiding behind CF and other portals, or outright becoming hostile to any scraping at all; we’re already seeing the rapid death of the ideal because of these malicious scrapers, and we should be using all available recourse to stop this bleeding.

                • tomalley8342@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  ·
                  2 days ago

                  DoS attacks are already a crime, so of course the need for some kind of solution is clear. But any proposal that gatekeeps the internet and restricts the freedoms with which the user can interact with it is no solution at all. To me, the openness of the web shouldn’t be something that people just consider, or are amenable to. It should be the foundation in which all reasonable proposals should consider as a principle truth.

            • Aatube@lemmy.dbzer0.com
              link
              fedilink
              English
              arrow-up
              3
              ·
              2 days ago

              That same logic is how Aaron Swartz was cornered into suicide for scraping JSTOR, something widely agreed to be a bad idea by a wide range of lawspeople including SCOTUS in its 2021 decision Van Buren v. US that struck this interpretation off the books.

          • Encrypt-Keeper@lemmy.world
            link
            fedilink
            English
            arrow-up
            4
            arrow-down
            1
            ·
            2 days ago

            If I put a banner on my site that says “by visiting my site you agree not to modify the scripts or ads displayed on the site,” does that make my visit with an ad blocker “unauthorized” under the CFAA?

            How would you “authorize” a user to access assets served by your systems based on what they do with them after they’ve accessed them? That doesn’t logically follow so no, that would not make an ad blocker unauthorized under the CFAA. Especially because you’re not actually taking any steps to deny these people access either.

            AI scrapers on the other hand are a type of users that you’re not authorizing to begin with, and if you’re using CloudFlares bot protection you’re putting into place a system to deny them access. To purposefully circumvent that access would be considered unauthorized.

            • GamingChairModel@lemmy.world
              link
              fedilink
              English
              arrow-up
              3
              ·
              2 days ago

              That doesn’t logically follow so no, that would not make an ad blocker unauthorized under the CFAA.

              The CFAA also criminalizes “exceeding authorized access” in every place it criminalizes accessing without authorization. My position is that mere permission (in a colloquial sense, not necessarily technical IT permissions) isn’t enough to define authorization. Social expectations and even contractual restrictions shouldn’t be enough to define “authorization” in this criminal statute.

              To purposefully circumvent that access would be considered unauthorized.

              Even as a normal non-bot user who sees the cloudflare landing page because they’re on a VPN or happen to share an IP address with someone who was abusing the network? No, circumventing those gatekeeping functions is no different than circumventing a paywall on a newspaper website by deleting cookies or something. Or using a VPN or relay to get around rate limiting.

              The idea of criminalizing scrapers or scripts would be a policy disaster.

        • cm0002@piefed.world
          link
          fedilink
          English
          arrow-up
          6
          ·
          2 days ago

          You say, just as news breaks that the top German court has over turned a decision that declared “AD blocking isn’t piracy”

            • cm0002@piefed.world
              link
              fedilink
              English
              arrow-up
              4
              arrow-down
              1
              ·
              2 days ago

              Please instruct me on how I go to the timeline where the legal system always makes decisions based on logic, reasoning, evidence and fairness and not…the opposite…of all those things

              You have a lot of trust placed in the courts to actually do the right thing

              • Encrypt-Keeper@lemmy.world
                link
                fedilink
                English
                arrow-up
                3
                ·
                edit-2
                2 days ago

                I’m not saying courts couldn’t pass a new law saying whatever they want. But the laws we have today would not allow for ad blocking to be considered unauthorized access. Not under the CFAA as mentioned.

                I said “The logic would not extend to that” not that a legal system could not act illogically.

                • cm0002@piefed.world
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  2 days ago

                  The original comment reply to you was all about how the legal system would act, that’s the primary concern. All it would take is a Trump loyalist judge, a Trump leaning appeals court and the right-wing Supreme Court and boom suddenly the CFAA covers a whole lot more than what was “logical”

        • Demdaru@lemmy.world
          link
          fedilink
          English
          arrow-up
          6
          arrow-down
          8
          ·
          2 days ago

          Ehhhh, you are gaining access to content due to assumption you are going to interact with ads and thus, bring revenue to the person and/or company producing said content. If you block ads, you remove authorisation brought to you by ads.

          • gian @lemmy.grys.it
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 day ago

            Carefull, this way even not looking at an ads positioned at the bottom of the page (or anyway not visible without scrolling) would mean to remove authorisation brought to you by ads.

          • Encrypt-Keeper@lemmy.world
            link
            fedilink
            English
            arrow-up
            4
            ·
            edit-2
            2 days ago

            That doesn’t make any logical sense. You cant tie legal authorization to an unsaid implicit assumption, especially when that is in turn based on what you do with the content you’ve retrieved from a system after you’ve accessed and retrieved it.

            When you access a system, are you authorized to do so, or aren’t you? If you are, that authorization can’t be retroactively revoked. If that were the case, you could be arrested for having used a computer at a job, once you’ve quit. Because even though you were authorized to use it and your corporate network while you worked there, now that you’ve quit and are no longer authorized that would apply retroactively back to when you DID work there.

      • kibiz0r@midwest.social
        link
        fedilink
        English
        arrow-up
        28
        ·
        2 days ago

        They already prosecute people under the unauthorized access provision. They just don’t prosecute rich people under it.

        • GamingChairModel@lemmy.world
          link
          fedilink
          English
          arrow-up
          14
          ·
          2 days ago

          They prosecuted and convicted a guy under the CFAA for figuring out the URL schema for an AT&T website designed to be accessed by the iPad when it first launched, and then just visiting that site by trying every URL in a script. And then his lawyer (the foremost expert on the CFAA) got his conviction overturned:

          https://www.eff.org/cases/us-v-auernheimer

          We have to maintain that fight, to make sure that the legal system doesn’t criminalize normal computer tinkering, like using scripts or even browser settings in ways that site owners don’t approve of.

      • WhyJiffie@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        4
        ·
        2 days ago

        for us, not for them. wait until they argue in court that actually its us at fault and we need to provide access or else

  • floquant@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    242
    arrow-down
    3
    ·
    2 days ago

    It’s difficult to be a shittier company than OpenAI, but Perplexity seems to be trying hard.

    • Leon@pawb.social
      link
      fedilink
      English
      arrow-up
      15
      ·
      2 days ago

      I’m still holding out for Stephen Hawking to mail out Demon Summoning programs.

  • Wispy2891@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    ·
    edit-2
    2 days ago

    Here comes the ridiculous offer to buy Google chrome with money they don’t have: easy delicious scraping directly from the user source

    • boonhet@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      55
      arrow-down
      1
      ·
      2 days ago

      As far as security is concerned, their w’s are pretty common tbh. It’s just the whole centralization issue.