• Categories
Collapse

The Silver Fern

Artificial Intelligence (Previously "Chat GPT")

Scheduled Pinned Locked Moved Off Topic
321 Posts 38 Posters 8.5k Views
Artificial Intelligence (Previously "Chat GPT")
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • BonesB Offline
    BonesB Offline
    Bones
    replied to Kirwan last edited by
    #311

    @Kirwan yeah we've got an internal gpt that's typically better, but can't produce docx - I'm sure I can probably get a bot sorted if it's gonna be a regular thing, was just annoying when I'd prompt it to add something and it would change two other things and drop a third.

    KirwanK 1 Reply Last reply
    0
  • BonesB Offline
    BonesB Offline
    Bones
    wrote last edited by
    #312

    I should add, most likely issue occurs between monitor and keyboard.

    1 Reply Last reply
    1
  • KirwanK Offline
    KirwanK Offline
    Kirwan
    replied to Bones last edited by
    #313

    @Bones said in Artificial Intelligence (Previously "Chat GPT"):

    @Kirwan yeah we've got an internal gpt that's typically better, but can't produce docx - I'm sure I can probably get a bot sorted if it's gonna be a regular thing, was just annoying when I'd prompt it to add something and it would change two other things and drop a third.

    Yeah ok, I've been playing with Task MCP servers for that. There's one called TaskMaster and another called Shrimp-Tasks. These help it plan, save the steps into a file and keep it's head straight about what it's doing. Funny to watch it argue with it self with thinking enabled.

    1 Reply Last reply
    1
  • nzzpN Online
    nzzpN Online
    nzzp
    wrote last edited by
    #314

    Dropping in for some Fern wisdom.

    Looking at levelling up some of the AI usage. Have access to Copilot which seems 'fine' but probably a bit behind the curve. Considering springing for Chat GPT plus. Are there better ones to look at?

    Usage will be in a professional environment typically processing language into standard forms - parsing email chains and dictation into templates (basically doing my donkey work for me).

    1 Reply Last reply
    0
  • KirwanK Offline
    KirwanK Offline
    Kirwan
    wrote last edited by
    #315

    Wait a week (or so) and evaluate Grok 4.

    They are all very similar at this point (the top models). Claude is the best for coding, some are better for visual stuff, but for pure text stuff you won't see a huge difference so you can be driven by price.

    The Grok team iterate faster. OpenAI are the 'leaders' right now, but with the exodus of talent and these huge data centers their competitors are setting up I'm not sure how long that will last.

    I'd recommend picking a monthly plan over a yearly one so you can swap as you need to.

    1 Reply Last reply
    3
  • KirwanK Offline
    KirwanK Offline
    Kirwan
    wrote last edited by
    #316

    OpenAI lost four more to Zuck. They are being gutted.

    Interesting tidbit is part of their offer is normal working hours. Sounds like OpenAI was 80+ a week.

    1 Reply Last reply
    0
  • KirwanK Offline
    KirwanK Offline
    Kirwan
    wrote last edited by
    #317

    The graph is crazy. This is a third part benchmark, Grok 4 is the first to break 10% and is twice as good as second place.

    1 Reply Last reply
    0
  • R Offline
    R Offline
    Rembrandt
    wrote last edited by
    #318

    @grok explain this to me like as though I'm an 8 year old

    *Okay! Imagine you have a big toy puzzle that’s super tricky, and it’s called ARC-AGI-2. This puzzle is special because it’s designed to test how smart something is at figuring out new things, kind of like how you learn to solve a new game without anyone telling you all the rules first. Grown-ups use it to see how smart their robot helpers, like AI (which is like a super smart computer brain), can be.

    Now, there’s a robot brain called Grok 4, made by a team called xAI. It’s like the smartest kid in class who just figured out how to solve 15.9% of that tricky puzzle! That’s a big deal because the other robot brains could only solve about half that much before. It’s like going from solving 8 pieces to solving 15 pieces of a 100-piece puzzle!

    The people at ARC Prize, who made the puzzle, checked Grok 4’s work and said, “Wow, you’re really good!” They even showed it on a big chart (like a scoreboard) to prove it. This makes Grok 4 the new champion, and it’s exciting because it means robots are getting better at thinking like us.

    But here’s a little grown-up stuff: some people are worried because making Grok 4 super smart used a lot of energy, and that’s not always good for the planet. It’s like using extra batteries that might hurt the Earth. So, while it’s cool that Grok 4 is so smart, grown-ups are also thinking about how to keep things safe and happy for everyone!*

    1 Reply Last reply
    2
  • KirwanK Offline
    KirwanK Offline
    Kirwan
    wrote last edited by
    #319

    The main part of why this graph is important, is the questions are private and they are formulated so they are not questions on the internet. So the models can't be trained to the benchmark, and it's not seen the question before.

    That's why all the models previously were doing so poorly. The guy why runs this describes anything under 10% as noise.

    This is the first step towards a model working out answers from the much overused term "first principles". It's a huge achievement. The first step to these things creating new knowledge (and yes I know Google's model has created a new algorithm, but this a new approach than what they did).

    The other test was "Humanity's Last Exam" and the models were previously tapping out on that at 25%. These are all PHD and above questions in many different domains, no single human could possibly answer all the questions, it would take a team of experts.

    Grok Heavy got 50.7% correct.

    1 Reply Last reply
    3
  • KirwanK Offline
    KirwanK Offline
    Kirwan
    wrote last edited by
    #320

    R 1 Reply Last reply
    0
  • R Offline
    R Offline
    Rembrandt
    replied to Kirwan last edited by
    #321

    @Kirwan The Sycophancy chapter is fascinating. AI with morals.

    1 Reply Last reply
    0

Artificial Intelligence (Previously "Chat GPT")
Off Topic
  • Login

  • Don't have an account? Register

  • Login or register to search.
  • First post
    Last post
0
  • Categories
  • Login

  • Don't have an account? Register

  • Login or register to search.