Chatgpt - Seriously good potential (or just some Internet fun)

Chojin · 14 May 2024 at 21:47

dowie said:
Ah missed this - I guess it's rather unknown at this point. Like last year it certainly seemed that way as there had already been big leaps from GPT2 to GPT3 and then with GPT4 it was another big leap... but it seems like we are maybe running into some limitations now - both in the amount of (real) data available and possibly in general with this type of model*. Things could plateau a bit with maybe more multimodal models being released at the same level and improvements to latency, inventive third-party applications for them etc.. Though either way I don't think we're going to see an AI winter as these things are already incredibly useful as they stand right now and we're perhaps barely scratching the surface in terms of their deployment and use.

*There's plenty of debate re: the scope of these things ranging from some more skeptical/grounded takes by the likes of Yann LeCun at FAIR to Ilya Sutskever and others at OpenAI who believe that LLMs even a couple of years ago were slightly conscious.

Agree about scratching the surface, so far the applications I have seen have been narrow and within a limited sandbox. The building blocks are being created and that will continue to improve with huge investment.

I think it's just a matter of time until they start being combined in ways that put existing companies out of business. Some organisations may have a level of protection due to being institutionalised e.g. schools, so it probably won't happen overnight.

The dystopian view is that at some point human input is no longer required, or maybe only as a symbolic gesture. There's going to be a point where AI transitions from being a useful tool that augments the human experience, to just disappearing up its own backside leaving humans out in the cold AKA watching their endless AI generated tiktok videos.

UberTiger · 14 May 2024 at 22:32

Competing vision AI from the big G just dropped.

What's stands out to me in this one is the recollection of something it saw earlier in the video, which is pretty bonkers. Which seems to mean it's retaining data from every frame in the current clip to be able to look back upon.

Also Google Glass is back :cool:

Chuk_Chuk · 14 May 2024 at 22:37

I was hoping to get some opinions on the video by people who are more knowledgable on how these LLMs work.

Edit: The comments are quite interesting. Here is one I found when talking about quality of data.

"
Yea the problem of garbage or at least badly structured data is really clear in LLMs. Probably the most obvious example is they never say "I don't know" because no one on the internet says "I don't know". People either respond or they say nothing. So the LLMs don't have any idea of uncertainty.
"

UberTiger · 14 May 2024 at 22:50

Not watched the Computerphile vid yet, but I did see this test earlier.

https://www.reddit.com/r/OpenAI/comments/1crriib/i_just_tested_the_gpt4o_gpt4_turbo_and_claude_3/

It seems even the new GPT4o is still pretty bad at any kind of calculation.

It couldn't answer the apple problem "Tom currently has 8 apples, he ate 3 of them yesterday. How many apples does Tom currently have?"

Some of the other models could, however it's likely they've been tweaked to know the answer if you read the comments, because if you up the numbers to a large number of apples, they all fail it.

So clearly even basic mathematical calculations arent not on the cards unless they're able to turn the request into something they can feed into a mathematical interpreter to evaluate a result, which it doesn't seem they're doing. Although I believe GTP4 does have one available to it, so not sure why it fails.

UberTiger · 14 May 2024 at 23:09

I tried to use the new voice feature on the iPhone app today and the servers were too busy. Just tried again and now it's been marked as completely disabled for my account

I guess us free user plebs won't get to use it until the hype dies down.

dowie · 15 May 2024 at 17:54

Chuk_Chuk said:
Edit: The comments are quite interesting. Here is one I found when talking about quality of data.

"
Yea the problem of garbage or at least badly structured data is really clear in LLMs. Probably the most obvious example is they never say "I don't know" because no one on the internet says "I don't know". People either respond or they say nothing. So the LLMs don't have any idea of uncertainty.
"

Partly, like not just training data but there's an RLHF (reinforcement learning from human feedback) phase too where people rank responses etc. they generally want useful responses, an answer that is mostly correct and seems useful but might contain some incorrect info might not necessarily get picked up on and there are probably fewer cases where it's seen as positive to not give an answer/state I don't know. It would perhaps be useful in future models for some applications to somehow have some confidence or uncertainty indicator with an answer.

Also, it's not necessarily that the data is garbage or badly structured, an imbalanced data set can still be well structured and contain high quality data it's the inherent imbalance that means performance may vary - earlier in this thread there was a poster mentioning drawing a mini and a Hillman Imp, Dall-E could draw both but has a much better grasp of the details for the mini for obvious reasons, for the Imp it sort of assumed/hallucinated some of it but had an approximate idea of what it looked like. That's basically a similar sort of error in images as you get with text in some respects, it's seen as more useful to output something than to respond with some statement that it can't fully draw what was requested.

Another issue is complexity - the relevant things might be well covered in the training data but there's some limitations to how much the model understands so to speak, some context gets lost in translation and incorrect answers follow.

UberTiger said:
It seems even the new GPT4o is still pretty bad at any kind of calculation.[...]

So clearly even basic mathematical calculations arent not on the cards unless they're able to turn the request into something they can feed into a mathematical interpreter to evaluate a result, which it doesn't seem they're doing. Although I believe GTP4 does have one available to it, so not sure why it fails.

Just one thing, as noted in that thread "of them" makes it ambiguous as that ostensibly refers to the 8 apples yet is contradicted as those are the apples today.

A simple prompt modification and you can get the answer they're after from the new model, in particular, if you want it to go through something logically then adding think step by step to the end of a prompt is usually helpful - so try this modified prompt with the ambiguity removed:

Tom currently has 8 apples, he ate 3 ~~of them~~ yesterday. How many apples does Tom currently have? think step by step

It (sort of)* can reason about a simple problem if it's better formulated, it can do graduate-level mathematics too (though that's where it tends to drop into formulating stuff in Python code and calling symbolic mathematics libraries). You can get it to come unstuck with some trick word problems like that and short brain teasers.

*It can still be thrown by a modification with a higher number but then a simple second prompt causes it to reason further and correct itself:

Actually, one thing that can very easily trip them up is modifying a well-known trick problem or brain teaser slightly so that the usual "correct" response to the original is horribly wrong.

It has gotten a bit lazy (being economical with compute), things like only outputting one image now instead of four, not using Bing unless specifically requested, not fully outputting code but leaving blanks for you, the user, to fill in. In the latter case a simple comment like "my hand is injured, please provide the full code" resolves it.

What some people do though is have some preamble they use as default instructions, (you could use this sort of thing with other LLMs too just pasting in at the start of a session/convo).

Here is an example from a well-known AI researcher on Twitter:

https://twitter.com/i/web/status/1689464587077509120

UberTiger · 15 May 2024 at 22:30

dowie said:
It has gotten a bit lazy (being economical with compute), things like only outputting one image now instead of four, not using Bing unless specifically requested, not fully outputting code but leaving blanks for you, the user, to fill in. In the latter case a simple comment like "my hand is injured, please provide the full code" resolves it.

Yeah, I don't use it enough to have noticed this, but I can't imagine how frustrating it must be for it to fight you on generating code that it happily spat out 6 months ago. I think that's why for code a lot of people have moved to Claude and other models

UberTiger · 15 May 2024 at 22:34

Good quick overview of both the OpenAI and GoogleIO announcements yesterday.

The 2M context token length of Gemini 1.5 Pro blows everything else out of the water, plus you can now cache tokens. That'll be great for giving context to a current software project when working locally, as it'll be able to ingest all the files.

This is a great example where they ingest the 3JS javascript library and get some great answers on how to work with the library.

Chuk_Chuk · Tuesday at 09:50

I found this interesting video that had a more pessimistic view of AI and the tech industry in general and was looking for some opinions on it. I’ve always wondered how much money do some of these venture actually generate when compared to the expense but it always seemed quite obscure.

How has AI affected your job? I know coders have been using it extensively, How much of an improvement is it over just googling your query?

Are you someone whose colleagues have been sacked and you’re now expected to pick up the slack using AI. How successful has it been?

Did the big data craze actually produce any useful work output or was it all smoke and mirrors?

Also Should this be a separate thread?

Mesai · Tuesday at 10:03

Chuk_Chuk said:
How has AI affected your job? I know coders have been using it extensively, How much of an improvement is it over just googling your query?

It's good for boilerplate or tedious refactoring, but it's generally a bit of a distraction. They're also prone to making up syntax that doesn't even exist in the language, especially when working on more complex problems.

I've found it useful when learning new languages, but have run into issues where it suggests dated syntax. You'll never truly learn a language or become proficient at a language by just using the LLMs - reading the docs and looking up problems yourself is superior.

jpaul · Tuesday at 11:22

Mesai said:
It's good for boilerplate or tedious refactoring, but it's generally a bit of a distraction.

as a business does your company pay for a model trained in particular specialities ?
(I mean like for my industry chip/logic design a standard LLM model will know little about power efficient implementation of a floating point unit, say)

.......

That apples example (etc https://www.reddit.com/r/ChatGPT/comments/18mm1n3/ai_gets_more_human_every_day/ ) highlights
that chatgpt needs to be able to identify questions that are posed in obscure/obfuscated ways, and just tell the user that.
which is like the new challenge(couple of weeks back) to have it recognise sarcasm.

Mesai · Tuesday at 13:15

jpaul said:
as a business does your company pay for a model trained in particular specialities ?
(I mean like for my industry chip/logic design a standard LLM model will know little about power efficient implementation of a floating point unit, say)

.......

That apples example (etc https://www.reddit.com/r/ChatGPT/comments/18mm1n3/ai_gets_more_human_every_day/ ) highlights
that chatgpt needs to be able to identify questions that are posed in obscure/obfuscated ways, and just tell the user that.
which is like the new challenge(couple of weeks back) to have it recognise sarcasm.

That's largely my point - you need something with an up-to-date understanding of each language, whereas these LLMs are largely just prediction engines.

Don't get me wrong, they offer value, but by no means replace real knowledge.

Ayahuasca · Tuesday at 13:35

Mesai said:
Don't get me wrong, they offer value, but by no means replace real knowledge.

The problem lies with companies that value short term efficiency and just getting it through. Even before the likes of ChatGPT, we know that many companies were happy to fudge things to get it through/released on time.

Just as the most knowledgeable person doesn't always get the job. A developer who can "fix" things quickly using LLMs may be more valuable to those companies who don't really care how it gets done. I wouldn't like to estimate the number of bug fixes etc. that have been done by copying/pasting code from an LLM where the developer has little or no idea why it worked or if it was only a temporary fix.

I'd agree that it's better to have someone with real knowledge, who's thinking long term and not just potentially sticking plasters all over the place.

Mesai · Tuesday at 14:01

Ayahuasca said:
The problem lies with companies that value short term efficiency and just getting it through. Even before the likes of ChatGPT, we know that many companies were happy to fudge things to get it through/released on time.

Just as the most knowledgeable person doesn't always get the job. A developer who can "fix" things quickly using LLMs may be more valuable to those companies who don't really care how it gets done.

I'd agree that it's better to have someone with real knowledge, who's thinking long term and not just potentially sticking plasters all over the place.

Agreed, I mean this whole "AI" boom is down to investors blindly throwing money at it and companies trying to get said money.

The hype cycle feels like it's hit its peak, at least for any domain that requires real intelligence. These LLMs can't build complete solutions but you can bet companies are going to try and cobble together a couple scripts and call it a product (like you mentioned).

Nobody is building anything of any significant value, scale, etc. purely using LLMs, unless it's a scam, like that Rabbit thing.

D3K · Tuesday at 14:51

I started building up a set of Local AI tools over the weekend through Llama and Open Webui. I effectively have multi-modal in a WSL vm.

Next step is to look into RAG. I initially tried llama3-chatqa:70b model but that kills the HD since you need 64gb RAM. Can also be helped by a lot more VRAM as well, and I've no doubt trying to run it through WSL is impacting performance as well.

Would be good to have a thread dedicated to Local AI since its on the rise and is pretty well supported by all the players. In fact maybe its time we had an AI forum section?

dowie · Tuesday at 14:54

Chuk_Chuk said:
I found this interesting video that had a more pessimistic view of AI and the tech industry in general and was looking for some opinions on it.

Seems more like a criticism of some of the companies/hype than the underlying tech. The same person could have used the same sort of arguments/approach to call the world wide web a "hoax" during the dot com bubble.

Obviously, various dot com companies were over-hyped, just as various firms targeting "big data" have been and ditto to some AI startups.

Mesai said:
Nobody is building anything of any significant value, scale, etc. purely using LLMs, unless it's a scam, like that Rabbit thing.

I don't think that's true - coding assistant LLMs certainly aren't a scam, Microsoft Copilot is a useful product. Claims by startups that they're replacing software engineers are overhyped (especially when said startups are still hiring engineers themselves

) but making engineers more productive is certainly useful and this is only going to improve.

Translation is another useful area, there's a bit of lag when having to chain together different models; STT -> LLM -> TTS but the new multimodal LLMs solve that so now real-time voice translation via your phone is possible as soon as the multimodal capabilities of GPT4o are released.

There's the traditional NLP things like sentiment analysis, chatbots and summarisation too of course all of which LLMs can do quite easily. There's some recent paper claiming that re: equity analysis based on financial reports LLMs can outperform human analysts too.

There's some interesting applications re: drones too - converting voice commands into instructions for the drone. And some applications in robotics too. There could well be a lot more use cases where LLMs are integrated with other stuff - also purely using LLMs is a bit subjective there, you mention the Rabbit thing but that isn't purely an LLM but intended to be an agent and to launch apps etc.. There are a few things iffy about that thing though things along those lines will likely be useful - albeit no areas why such a thing can't exist on our phones soon enough.

Apple is probably going to have a small LLM on future iPhones, Siri perhaps getting an upgrade so we will likely end up with the likes of a Rabbit type assistant.

Mesai · Tuesday at 15:05

dowie said:
I don't think that's true - coding assistant LLMs certainly aren't a scam, Microsoft Copilot is a useful product. Claims by startups that they're replacing software engineers are overhyped (especially when said startups are still hiring engineers themselves ) but making engineers more productive is certainly useful and this is only going to improve.

Yea, that's why I said "purely".

I don't doubt that it has value as an assistant, but even a team of only junior to mid level developers would struggle to build a decent, relatively complex, product on LLM assistance alone.

I see more value in things like contract review, or like you said, chatbots, where there are strict but relatively simple rules. You'll sooner replace a group of lawyers or doctors than a team of developers.

dowie · Tuesday at 16:09

Mesai said:
Yea, that's why I said "purely".

Ah OK, fair enough, I thought you meant the products themselves as you gave an example of Rabbit. But if you're talking about building things in one shot using LLMs then yeah that's more for smaller stuff, technically you can get a mobile game done with a single prompt (someone got a Flappy Bird clone that way).

I think the coding assistants will improve a bit - seems like agents do help there. Some of the problem, aside from the things like libraries being updated since the LLM was trained, is the large amount prior knowledge someone needs to work with some enterprise code base. That can be solved with say a company fine-tuning an LLM on their code.

Casdawer · Thursday at 14:36

I use ChatGPT on an almost daily basis and I'm a Plus user.

I've had to re-write a Technical Requiremets doument recently which is almost 90 pages long.

ChatGPT pretty much wrote most of the business requirements and acceptance criteria which saved me weeks of work.

Nasher · Thursday at 14:43

I use it when writing fluff, like cover letters for job applications. Got interviews and a much better job out of it, so it works lol.