1
00:00:03,520 --> 00:00:06,080
Welcome to episode 397

2
00:00:06,080 --> 00:00:09,279
of the Microsoft Cloud IT Pro podcast recorded

3
00:00:09,279 --> 00:00:11,939
live on 03/10/2025.

4
00:00:12,160 --> 00:00:14,480
This is a show about Microsoft three sixty

5
00:00:14,480 --> 00:00:16,554
five and Azure from the perspective of IT

6
00:00:16,554 --> 00:00:18,714
pros and end users, where we discuss the

7
00:00:18,714 --> 00:00:20,875
topic or recent news and how it relates

8
00:00:20,875 --> 00:00:23,114
to you. We've been talking a lot about

9
00:00:23,114 --> 00:00:24,494
AI recently, particularly

10
00:00:24,954 --> 00:00:26,175
Microsoft Copilots.

11
00:00:26,875 --> 00:00:28,714
But what if you want to play around

12
00:00:28,714 --> 00:00:31,960
with AI outside of Copilot or chat GPT

13
00:00:31,960 --> 00:00:34,600
or any other hosted AI tool? In today's

14
00:00:34,600 --> 00:00:35,100
episode,

15
00:00:35,479 --> 00:00:37,559
Scott and Ben dive into the world of

16
00:00:37,559 --> 00:00:38,539
local LLMs,

17
00:00:39,000 --> 00:00:41,659
large language models, that run entirely

18
00:00:42,039 --> 00:00:44,295
on your device. We look at what models

19
00:00:44,295 --> 00:00:45,975
you can run, how you can integrate them

20
00:00:45,975 --> 00:00:47,914
into your workflow, and more.

21
00:00:49,975 --> 00:00:52,375
Oh, Scott. Here we are back in the

22
00:00:52,375 --> 00:00:55,495
stormy South. Stormy South. It has been stormy,

23
00:00:55,495 --> 00:00:57,539
but it's bright and sunny now. So I'll

24
00:00:57,539 --> 00:00:58,579
take it while I can get it. I

25
00:00:58,579 --> 00:01:00,579
don't have anything to go with Nordic. From

26
00:01:00,579 --> 00:01:02,740
the Nordic North, I'm back to the stormy

27
00:01:02,740 --> 00:01:03,240
South.

28
00:01:04,579 --> 00:01:06,659
From sea to shining sea and everything in

29
00:01:06,659 --> 00:01:08,260
between the seas? As long as you count

30
00:01:08,260 --> 00:01:10,005
Lake Michigan as the sea, which if you're

31
00:01:10,005 --> 00:01:12,084
from Michigan, you do. Like, East Coast, West

32
00:01:12,084 --> 00:01:13,924
Coast in Michigan are Lake Michigan and Lake

33
00:01:13,924 --> 00:01:16,245
Huron. We don't really count oceans in Michigan.

34
00:01:16,245 --> 00:01:18,084
Some of those lakes are kinda big, so,

35
00:01:18,484 --> 00:01:20,084
you might even say they're great. They could

36
00:01:20,084 --> 00:01:21,704
be great. It's always interesting.

37
00:01:22,005 --> 00:01:24,165
Side topic, coming down to Florida and talking

38
00:01:24,165 --> 00:01:26,549
to people about lakes and being from Michigan

39
00:01:26,549 --> 00:01:27,370
and Lake Michigan

40
00:01:27,750 --> 00:01:29,530
and Lake Superior and

41
00:01:29,829 --> 00:01:31,109
they're like, but it's a lake. And I'm

42
00:01:31,109 --> 00:01:32,790
like, yeah, but you can't see across it.

43
00:01:32,790 --> 00:01:34,629
So it kinda looks like an ocean when

44
00:01:34,629 --> 00:01:36,549
you're standing on the shore, and we get

45
00:01:36,549 --> 00:01:38,134
waves that are like, well, I think the

46
00:01:38,134 --> 00:01:40,614
biggest waves I've ever recorded in Lake Michigan

47
00:01:40,614 --> 00:01:42,774
were like 25, 20 six feet, and Lake

48
00:01:42,774 --> 00:01:44,774
Superior was up to 32

49
00:01:44,774 --> 00:01:46,854
foot waves. It's like these are not just

50
00:01:46,854 --> 00:01:49,414
like little lakes. These are massive bodies of

51
00:01:49,414 --> 00:01:49,914
water.

52
00:01:50,900 --> 00:01:53,159
They're really, really big ponds, you know.

53
00:01:53,700 --> 00:01:55,060
Joshua Foer: So now we're going across the

54
00:01:55,060 --> 00:01:56,579
pond and that means across Lake Michigan. Joshua

55
00:01:56,579 --> 00:01:58,019
Foer: An LLM what it thinks. Joshua Foer:

56
00:01:58,019 --> 00:02:00,500
We should ask an LLM because we're going

57
00:02:00,500 --> 00:02:02,099
to talk about LLMs. Joshua Foer: We're back

58
00:02:02,099 --> 00:02:04,579
to those things again. So I wanted to

59
00:02:04,579 --> 00:02:05,640
have a chat today

60
00:02:06,019 --> 00:02:06,519
about

61
00:02:07,034 --> 00:02:07,534
LLMs

62
00:02:07,994 --> 00:02:08,495
and

63
00:02:08,955 --> 00:02:11,514
running them locally. Like, I've been doing this

64
00:02:11,514 --> 00:02:13,675
more and more, and I think there's an

65
00:02:13,675 --> 00:02:16,155
interesting set of use cases and workflows. And

66
00:02:16,155 --> 00:02:17,614
I was having a chat with you,

67
00:02:17,914 --> 00:02:20,340
and this isn't something that you do in

68
00:02:20,340 --> 00:02:22,099
your kinda day today from what it sounds

69
00:02:22,099 --> 00:02:23,699
like, but maybe I can, like, get you

70
00:02:23,699 --> 00:02:25,219
in and and and hook you in a

71
00:02:25,219 --> 00:02:27,060
little bit along the way. Oh, you've already

72
00:02:27,060 --> 00:02:28,340
got me hooked. You sent me a few

73
00:02:28,340 --> 00:02:30,340
YouTube videos, and I started watching it, and

74
00:02:30,340 --> 00:02:31,639
the wheels started clicking.

75
00:02:31,955 --> 00:02:34,194
And I have one of the browser tabs

76
00:02:34,194 --> 00:02:35,634
up here. We'll put a link to it,

77
00:02:35,634 --> 00:02:37,555
Scott, about a use case that I already

78
00:02:37,555 --> 00:02:39,254
have for a local LLM.

79
00:02:39,715 --> 00:02:41,574
And you definitely got my wheels

80
00:02:41,875 --> 00:02:42,935
turning about

81
00:02:43,235 --> 00:02:45,715
what possibilities there are about how some of

82
00:02:45,715 --> 00:02:48,310
this works. In Microsoft three sixty five, I

83
00:02:48,310 --> 00:02:50,469
have played around with Copilot. I know a

84
00:02:50,469 --> 00:02:52,229
fair amount, but I've never really looked at

85
00:02:52,229 --> 00:02:54,650
running them locally and bewet my appetite

86
00:02:54,949 --> 00:02:57,370
for this. So this will be an interesting

87
00:02:57,430 --> 00:03:00,229
discussion, and I'm curious to see where it

88
00:03:00,229 --> 00:03:01,849
goes, your thoughts,

89
00:03:02,284 --> 00:03:03,025
my new

90
00:03:03,405 --> 00:03:06,444
thoughts. And my expanding list, Scott, you added

91
00:03:06,444 --> 00:03:08,444
something new to my list. I was doing

92
00:03:08,444 --> 00:03:10,444
so good. It's been a hot minute, but

93
00:03:10,444 --> 00:03:12,044
I I I think this is an important

94
00:03:12,044 --> 00:03:14,064
one. So as we talk about

95
00:03:14,540 --> 00:03:15,040
the

96
00:03:15,659 --> 00:03:16,879
kinda growth

97
00:03:17,259 --> 00:03:17,759
of

98
00:03:18,219 --> 00:03:19,199
generative AI

99
00:03:19,819 --> 00:03:22,800
and models along the way for,

100
00:03:23,580 --> 00:03:26,139
you know, certainly the the copilots of the

101
00:03:26,139 --> 00:03:26,639
world,

102
00:03:27,094 --> 00:03:27,754
the OpenAI's,

103
00:03:28,775 --> 00:03:30,474
Anthropic with Claude,

104
00:03:30,854 --> 00:03:33,034
DeepSeek with r one,

105
00:03:33,655 --> 00:03:35,655
all all these different kinds of things that

106
00:03:35,655 --> 00:03:37,514
exist out there. So

107
00:03:38,134 --> 00:03:39,655
they're they're nice that you can run them

108
00:03:39,655 --> 00:03:40,474
in a service.

109
00:03:40,909 --> 00:03:43,389
And I think most of us have kind

110
00:03:43,389 --> 00:03:46,030
of grown accustomed to that, and and it's

111
00:03:46,030 --> 00:03:47,550
it's a place that most of us are

112
00:03:47,550 --> 00:03:49,629
comfortable. Like, we know how to sign in

113
00:03:49,629 --> 00:03:51,650
to chat GPT on the web and maybe

114
00:03:51,870 --> 00:03:52,370
either

115
00:03:52,915 --> 00:03:56,034
have a chat with an LLM and and

116
00:03:56,034 --> 00:03:58,275
do some structured prompting and and try and

117
00:03:58,275 --> 00:03:59,895
get some responses out of it

118
00:04:00,194 --> 00:04:00,694
versus

119
00:04:01,395 --> 00:04:03,955
things like ChatGPT web search. And it's great.

120
00:04:03,955 --> 00:04:05,655
Right? It's it's all cloud based.

121
00:04:06,030 --> 00:04:07,870
Some of them are free. Some of them

122
00:04:07,870 --> 00:04:08,610
cost money.

123
00:04:09,069 --> 00:04:10,930
They really only start to get powerful

124
00:04:11,229 --> 00:04:14,189
when they do cost money. So now you're

125
00:04:14,189 --> 00:04:16,029
in the world where you're relying on this

126
00:04:16,029 --> 00:04:18,769
external service. You're gonna pay per request.

127
00:04:19,175 --> 00:04:22,055
And probably most importantly, there's a privacy angle

128
00:04:22,055 --> 00:04:24,714
here where you're sending your data out into

129
00:04:24,935 --> 00:04:27,095
the wild. Like, when you're chatting with Chat

130
00:04:27,095 --> 00:04:28,475
GPT in the web interface,

131
00:04:28,775 --> 00:04:30,855
you're passing that data to them. We saw

132
00:04:30,855 --> 00:04:33,095
this with DeepSeq. When DeepSeq kinda came out

133
00:04:33,095 --> 00:04:33,754
of the woodwork

134
00:04:34,099 --> 00:04:35,860
a couple weeks ago and the market freaked

135
00:04:35,860 --> 00:04:37,139
out. You know, they were about a month

136
00:04:37,139 --> 00:04:38,740
behind freaking out when it had actually been

137
00:04:38,740 --> 00:04:42,039
released. But that said, you know, DeepSeek immediately

138
00:04:42,099 --> 00:04:44,680
had a data leak and people broke in

139
00:04:44,819 --> 00:04:46,740
and they got all the usernames, they got

140
00:04:46,740 --> 00:04:48,795
the passwords, they got the prompts that were

141
00:04:48,795 --> 00:04:50,714
flowing through that system, things like that. So

142
00:04:50,714 --> 00:04:52,394
I think one of the most powerful things

143
00:04:52,394 --> 00:04:54,654
here is the ability to

144
00:04:55,514 --> 00:04:56,894
run a local LLM

145
00:04:57,274 --> 00:04:57,774
with

146
00:04:58,154 --> 00:05:00,329
data privacy in mind. So I'm gonna run

147
00:05:00,329 --> 00:05:02,329
these things locally. They're only going to be

148
00:05:02,329 --> 00:05:04,649
on my machine. They're not gonna communicate with

149
00:05:04,649 --> 00:05:06,810
the outside world. And then if you're in

150
00:05:06,810 --> 00:05:09,289
that world of, you know, being a little

151
00:05:09,289 --> 00:05:10,509
bit more cost conscious,

152
00:05:11,129 --> 00:05:13,370
you might wanna try some of these things

153
00:05:13,370 --> 00:05:15,915
out without paying per request in a service

154
00:05:15,915 --> 00:05:18,394
like chat, GPT, or Claude, or or something

155
00:05:18,394 --> 00:05:21,035
like that. And in that world, you're gonna

156
00:05:21,035 --> 00:05:23,134
also have a cost savings angle.

157
00:05:23,514 --> 00:05:25,214
You're gonna have offline capabilities.

158
00:05:25,675 --> 00:05:28,095
So the ability to chat with these models

159
00:05:28,154 --> 00:05:30,470
locally can be a little bit interesting

160
00:05:30,930 --> 00:05:32,470
and and how all that composes.

161
00:05:32,930 --> 00:05:35,110
And, you know, I think the kicker is

162
00:05:35,490 --> 00:05:37,490
most of us are geeks, and we run

163
00:05:37,490 --> 00:05:40,129
around with these really powerful computers. You know,

164
00:05:40,129 --> 00:05:42,689
you've got a laptop with gobs and gobs

165
00:05:42,689 --> 00:05:45,214
of RAM on it, and it's running a

166
00:05:45,214 --> 00:05:48,254
modern processor, it's got a GPU, it's got

167
00:05:48,254 --> 00:05:48,995
an MP,

168
00:05:49,694 --> 00:05:51,134
you know, you might be sitting there at

169
00:05:51,134 --> 00:05:52,895
home and you're like a PC gamer and

170
00:05:52,895 --> 00:05:54,274
that's how you,

171
00:05:54,574 --> 00:05:56,574
you know, just relax at the end of

172
00:05:56,574 --> 00:05:58,319
the day. Well, guess what? You got that

173
00:05:58,319 --> 00:06:00,639
monster GPU, you know, that fifty ninety or

174
00:06:00,639 --> 00:06:02,399
whatever that you can potentially use during the

175
00:06:02,399 --> 00:06:04,319
day with these things. And it turns out

176
00:06:04,319 --> 00:06:06,479
that you might actually chat with local LMs,

177
00:06:06,479 --> 00:06:08,560
like, more than you think. You know, like

178
00:06:08,720 --> 00:06:10,740
so we've talked about how we're Apple users,

179
00:06:11,305 --> 00:06:14,185
so iOS, things like that. The predictive text

180
00:06:14,185 --> 00:06:17,004
on iOS is all based on an LLM.

181
00:06:17,144 --> 00:06:18,764
It's based on a transformer.

182
00:06:19,384 --> 00:06:21,144
So that thing is running a local model.

183
00:06:21,144 --> 00:06:23,589
Well, you can run those similar models on

184
00:06:23,589 --> 00:06:25,669
your side. So it gives you this really

185
00:06:25,669 --> 00:06:29,689
interesting opportunity to kinda take advantage of AI

186
00:06:30,149 --> 00:06:32,789
while maintaining the privacy aspects, maybe letting you

187
00:06:32,789 --> 00:06:34,229
play with new things. Like, if you wanna

188
00:06:34,229 --> 00:06:36,389
play with DeepSeek without signing up for the

189
00:06:36,389 --> 00:06:37,370
DeepSeek service,

190
00:06:37,685 --> 00:06:39,525
like, hey, that that that that's a great

191
00:06:39,525 --> 00:06:40,725
way to do it. So we'll talk a

192
00:06:40,725 --> 00:06:42,085
little bit about that and kind of some

193
00:06:42,085 --> 00:06:43,925
of the advantages and what you can get

194
00:06:43,925 --> 00:06:46,904
on with. We should also talk about what

195
00:06:47,525 --> 00:06:49,064
folks can actually run,

196
00:06:49,365 --> 00:06:52,439
like, what's useful useful that can run locally

197
00:06:52,439 --> 00:06:53,879
for you. So we're gonna talk a little

198
00:06:53,879 --> 00:06:56,779
bit about, like, parameter size in a model

199
00:06:56,839 --> 00:06:59,800
and how big these things are. So turns

200
00:06:59,800 --> 00:07:02,300
out there's a big difference between a

201
00:07:02,680 --> 00:07:06,044
1,000,000,000 parameter model, a 7,000,000,000 parameter model, a

202
00:07:06,044 --> 00:07:06,925
65,000,000,000

203
00:07:06,925 --> 00:07:09,004
parameter model, or, you know, like I said,

204
00:07:09,004 --> 00:07:10,384
if you wanna play around with DeepSeq,

205
00:07:10,764 --> 00:07:12,444
I was watching some videos on YouTube of

206
00:07:12,444 --> 00:07:15,264
people who are playing around with some clustered

207
00:07:15,644 --> 00:07:16,144
servers

208
00:07:16,604 --> 00:07:18,285
to do, like, 400,000,000,000

209
00:07:18,285 --> 00:07:20,660
parameter model runs. And, you know, you can't

210
00:07:20,660 --> 00:07:22,899
run, like, 400,000,000,000 parameters locally. You need, like,

211
00:07:22,899 --> 00:07:25,220
a distributed system, and, you you know, you

212
00:07:25,220 --> 00:07:27,860
can potentially do it across a series of

213
00:07:27,860 --> 00:07:30,339
servers within your premises. But that said, like,

214
00:07:30,339 --> 00:07:32,339
those aren't for everybody. They're gonna be too

215
00:07:32,339 --> 00:07:35,060
slow, cost a bunch for the GPUs, things

216
00:07:35,060 --> 00:07:36,555
like that. So we'll talk a little bit

217
00:07:36,555 --> 00:07:38,954
about that, about like parameters and, you know,

218
00:07:38,954 --> 00:07:42,235
maybe where more parameters doesn't always mean, like,

219
00:07:42,235 --> 00:07:44,735
better results. I think that's important too.

220
00:07:45,035 --> 00:07:46,714
There there's a little bit of nuance and

221
00:07:46,714 --> 00:07:48,735
kind of trade off here between

222
00:07:49,240 --> 00:07:51,720
speed of response, like how many tokens can

223
00:07:51,720 --> 00:07:53,639
an LLM respond back to you with, what's

224
00:07:53,639 --> 00:07:56,519
the accuracy of that, and probably most importantly,

225
00:07:56,519 --> 00:07:58,279
like what are the compute requirements on your

226
00:07:58,279 --> 00:08:00,039
end. So like the things that I'm gonna

227
00:08:00,039 --> 00:08:01,800
talk about that I run today, so I

228
00:08:01,800 --> 00:08:03,819
rock an m MaxBook Pro

229
00:08:04,295 --> 00:08:06,455
most of the time, and that's kind of

230
00:08:06,455 --> 00:08:07,975
like what I'm running on. And I've got,

231
00:08:07,975 --> 00:08:09,735
you know, 32 gigs of RAM in there,

232
00:08:09,735 --> 00:08:11,595
and and I'm all set in my world.

233
00:08:11,895 --> 00:08:14,214
You have a a different model on a

234
00:08:14,214 --> 00:08:15,035
different processor

235
00:08:15,574 --> 00:08:19,009
with more memory and potentially more GPUs, so

236
00:08:19,009 --> 00:08:20,529
you'll be able to run, like, maybe even,

237
00:08:20,529 --> 00:08:22,870
like, bigger things than I can run here.

238
00:08:22,930 --> 00:08:24,310
And that's okay. And then,

239
00:08:24,770 --> 00:08:27,089
you know, your mileage may vary. But it's

240
00:08:27,089 --> 00:08:28,930
kind of like anybody can get started with

241
00:08:28,930 --> 00:08:31,435
these things, even on, like, a little,

242
00:08:31,975 --> 00:08:34,294
you know, off the shelf NUC kind of

243
00:08:34,294 --> 00:08:37,254
PC or things like that. So beyond chatting

244
00:08:37,254 --> 00:08:38,075
with these things,

245
00:08:38,774 --> 00:08:40,615
you can also use them to empower your

246
00:08:40,615 --> 00:08:41,115
workflows.

247
00:08:41,654 --> 00:08:44,375
So you can use local AI models with

248
00:08:44,375 --> 00:08:46,830
Visual Studio Code. Like, you might sit out

249
00:08:46,830 --> 00:08:47,809
and go and say,

250
00:08:48,590 --> 00:08:50,929
I'm coding a dot net application.

251
00:08:51,470 --> 00:08:54,830
Let me go find the best LLM model

252
00:08:54,830 --> 00:08:56,210
for dot net applications,

253
00:08:56,669 --> 00:08:58,074
but I don't wanna pay for it. I

254
00:08:58,074 --> 00:09:00,074
I don't wanna, like, go to OpenAI or

255
00:09:00,074 --> 00:09:02,154
Anthropic and and do the cloud thing, anything

256
00:09:02,154 --> 00:09:03,995
like that. Well, maybe you can go out

257
00:09:03,995 --> 00:09:05,754
and actually just download a model and run

258
00:09:05,754 --> 00:09:07,595
it locally, and we'll kind of talk about

259
00:09:07,595 --> 00:09:10,074
the hosting engines for these things that expose

260
00:09:10,074 --> 00:09:12,730
things like standard OpenAI endpoints. So you can

261
00:09:12,730 --> 00:09:15,629
literally point Versus Code at a local LLM

262
00:09:15,769 --> 00:09:17,529
and have it write you PowerShell and all

263
00:09:17,529 --> 00:09:19,210
those things that are, like, private just to

264
00:09:19,210 --> 00:09:19,870
your machine

265
00:09:20,170 --> 00:09:23,210
without having to go out to the Internet

266
00:09:23,210 --> 00:09:25,154
and get those kinds of things done. So

267
00:09:25,235 --> 00:09:26,915
I think that's a fun little way to

268
00:09:26,915 --> 00:09:29,254
kind of think about integrating these things

269
00:09:29,715 --> 00:09:31,555
into your life and how they come together.

270
00:09:31,555 --> 00:09:33,154
So we just kind of want to go

271
00:09:33,154 --> 00:09:35,475
end to end and full circle between, can

272
00:09:35,475 --> 00:09:37,975
you run your own chat GPT

273
00:09:38,675 --> 00:09:39,175
like

274
00:09:39,600 --> 00:09:40,259
thing, model

275
00:09:41,039 --> 00:09:43,600
locally? And the answer is yes. So, yeah,

276
00:09:43,600 --> 00:09:45,440
like we should just kind of have a

277
00:09:45,440 --> 00:09:45,940
conversation

278
00:09:46,799 --> 00:09:47,700
about that. So

279
00:09:48,399 --> 00:09:50,639
why don't we start with like the whole

280
00:09:50,639 --> 00:09:53,360
data and privacy cost efficiency thing and all

281
00:09:53,360 --> 00:09:54,904
that stuff? I think that's one of the

282
00:09:54,904 --> 00:09:56,924
ones that can be super important

283
00:09:58,024 --> 00:10:00,105
that people think about. And kinda like you

284
00:10:00,105 --> 00:10:02,345
said, the deep sea click exposed to millions

285
00:10:02,345 --> 00:10:02,845
sensitive

286
00:10:03,225 --> 00:10:05,544
data records. One thing I've heard even when

287
00:10:05,544 --> 00:10:07,565
you start looking at things like ChatGPT

288
00:10:08,105 --> 00:10:08,605
versus

289
00:10:09,304 --> 00:10:11,759
Copilot and Microsoft three sixty five and going

290
00:10:11,759 --> 00:10:14,580
back to the local ones or doing OpenAI

291
00:10:14,720 --> 00:10:16,420
in Azure or

292
00:10:16,960 --> 00:10:19,680
something in AWS is it it very much

293
00:10:19,680 --> 00:10:21,279
goes back to where does that data go.

294
00:10:21,279 --> 00:10:23,680
Some people see rolling out Copilot as a

295
00:10:23,680 --> 00:10:26,365
security benefit because then they're not taking all

296
00:10:26,365 --> 00:10:27,585
that data from

297
00:10:27,965 --> 00:10:30,845
SharePoint, from Teams, from their Microsoft three sixty

298
00:10:30,845 --> 00:10:33,424
five tenant, sending it out into ChatGPT

299
00:10:33,804 --> 00:10:36,524
where it's escaping that Microsoft three sixty five

300
00:10:36,524 --> 00:10:39,470
boundary. OpenAI and Azure, same thing. If all

301
00:10:39,470 --> 00:10:42,210
your data's up in Azure somewhere, if you're

302
00:10:42,509 --> 00:10:45,309
working with Scott to store petabytes of data

303
00:10:45,309 --> 00:10:47,309
in blob storage and you want that to

304
00:10:47,309 --> 00:10:49,470
be used for OpenAI, you can do that.

305
00:10:49,470 --> 00:10:50,909
But then you do get into this local

306
00:10:50,909 --> 00:10:52,129
thing. All your data's

307
00:10:52,595 --> 00:10:54,835
local. Or one of the scenarios I have

308
00:10:54,835 --> 00:10:56,514
that we can put a link to is

309
00:10:56,514 --> 00:10:58,754
I use Home Assistant for all my smart

310
00:10:58,754 --> 00:10:59,894
home stuff because

311
00:11:00,355 --> 00:11:01,254
I like everything

312
00:11:01,555 --> 00:11:03,235
local. I don't want it all going out

313
00:11:03,235 --> 00:11:05,495
to relying on Samsung or

314
00:11:05,795 --> 00:11:08,370
any of those. What if you wanna integrate

315
00:11:08,590 --> 00:11:11,789
AI into your local smart home stuff and,

316
00:11:11,789 --> 00:11:14,129
again, you wanna keep it all internal? You're

317
00:11:14,190 --> 00:11:15,809
in an industry where

318
00:11:16,269 --> 00:11:18,590
you need to keep things on premises for

319
00:11:18,590 --> 00:11:21,115
some reason or certain regulations around that. I

320
00:11:21,115 --> 00:11:23,215
think there's a huge benefit to doing

321
00:11:23,595 --> 00:11:26,095
local AI, whether it's at that small

322
00:11:26,475 --> 00:11:27,375
in your house,

323
00:11:27,915 --> 00:11:29,855
you and I type scenario of

324
00:11:30,235 --> 00:11:33,215
smart home or something here or large enterprises

325
00:11:33,754 --> 00:11:38,220
that have very stringent data requirements and need

326
00:11:38,220 --> 00:11:40,460
to run it locally, maybe in their own

327
00:11:40,460 --> 00:11:43,519
data centers in clusters that they build internally

328
00:11:43,580 --> 00:11:45,820
and stuff. Home Assistant is a fun one.

329
00:11:45,820 --> 00:11:48,294
So if you think about AI and Home

330
00:11:48,294 --> 00:11:49,894
Assistant and what they're doing with like Home

331
00:11:49,894 --> 00:11:52,054
Assistant voice and some of those things, it

332
00:11:52,054 --> 00:11:55,414
relies on two paths. One is text to

333
00:11:55,414 --> 00:11:58,054
speech. So can I have Home Assistant talk

334
00:11:58,054 --> 00:11:59,735
to me? So some text goes in and

335
00:11:59,735 --> 00:12:01,195
can I have it talk back to me?

336
00:12:01,419 --> 00:12:03,980
And then it's also speech to text in

337
00:12:03,980 --> 00:12:05,600
the form of things maybe

338
00:12:05,980 --> 00:12:06,879
like Whisper,

339
00:12:07,259 --> 00:12:09,259
which is, you know, typically what I see

340
00:12:09,259 --> 00:12:11,660
integrated with most on that side. In fact,

341
00:12:11,660 --> 00:12:15,179
we use Whisper for generating transcripts sometimes for

342
00:12:15,179 --> 00:12:17,205
the show. So it's not just LLMs.

343
00:12:17,665 --> 00:12:19,665
It could be things like text to speech,

344
00:12:19,665 --> 00:12:22,165
speech to text. Could also be image generation.

345
00:12:22,225 --> 00:12:24,384
Like, if somebody is looking to, like, play

346
00:12:24,384 --> 00:12:26,945
around with stable diffusion, that that runs pretty

347
00:12:26,945 --> 00:12:27,764
well locally

348
00:12:28,269 --> 00:12:29,870
on most of these things as well. It

349
00:12:29,870 --> 00:12:31,549
could be a little bit slow, but, hey,

350
00:12:31,549 --> 00:12:33,629
that that's okay. That's that's part of the

351
00:12:33,629 --> 00:12:35,950
trade off of not having to pay and

352
00:12:35,950 --> 00:12:38,110
and push these things through. But I I

353
00:12:38,110 --> 00:12:39,950
think the most important thing is just when

354
00:12:39,950 --> 00:12:44,274
you're running an LLM locally, you're basically mitigating

355
00:12:44,274 --> 00:12:45,495
a bunch of that risk

356
00:12:46,115 --> 00:12:48,834
of having to worry about compliance, having to

357
00:12:48,834 --> 00:12:51,735
worry about legal concerns. Like, hey, I'm submitting,

358
00:12:51,875 --> 00:12:54,115
like, this thing that's important to me. Like,

359
00:12:54,115 --> 00:12:56,274
I'm never like, for example, I'm never going

360
00:12:56,274 --> 00:12:57,495
to chat with my taxes

361
00:12:58,039 --> 00:13:00,439
with anything other than, like, a local LLM

362
00:13:00,439 --> 00:13:01,879
to help me break some of that stuff

363
00:13:01,879 --> 00:13:02,379
down.

364
00:13:03,159 --> 00:13:05,240
But, you know, somebody else might be out

365
00:13:05,240 --> 00:13:07,000
there, but good good luck when you're when

366
00:13:07,000 --> 00:13:08,759
you're in the next data breach or or

367
00:13:08,759 --> 00:13:11,345
or whatever happens. So there's things like that.

368
00:13:11,345 --> 00:13:12,945
I think the other one that's important to

369
00:13:12,945 --> 00:13:15,424
consider is kind of the cost angle of

370
00:13:15,424 --> 00:13:17,904
things. Like, I'll be the first to admit

371
00:13:17,904 --> 00:13:20,225
that I'm pretty frugal. So if you're thinking

372
00:13:20,225 --> 00:13:22,544
about maybe like OpenAI and having to go

373
00:13:22,544 --> 00:13:24,945
out and pay for OpenAI, and you're either

374
00:13:24,945 --> 00:13:27,129
paying per request or you're on one of

375
00:13:27,129 --> 00:13:28,889
the monthly plans. And those can get pretty

376
00:13:28,889 --> 00:13:30,250
expensive. Right? If you wanna get up there,

377
00:13:30,250 --> 00:13:31,769
you can spend up to, like, $200 a

378
00:13:31,769 --> 00:13:34,169
month. But typically, they're on the order of,

379
00:13:34,169 --> 00:13:34,669
like,

380
00:13:35,129 --> 00:13:38,990
you know, 1¢ US per 1,000 tokens.

381
00:13:39,365 --> 00:13:40,964
And then you're like, Well, what's a token?

382
00:13:40,964 --> 00:13:43,125
Like, how many words comprise a token? Like,

383
00:13:43,125 --> 00:13:44,164
it can be a little bit weird to

384
00:13:44,164 --> 00:13:45,845
figure out the pricing. So sometimes you just

385
00:13:45,845 --> 00:13:48,105
want to play around with these things locally

386
00:13:48,884 --> 00:13:50,824
without having that cost constraint,

387
00:13:51,204 --> 00:13:53,959
because costs can run away from you pretty

388
00:13:53,959 --> 00:13:56,620
quickly, especially if you're being like super chatty

389
00:13:57,000 --> 00:14:00,120
and doing longer chat threads and things like

390
00:14:00,120 --> 00:14:02,199
that. Or the other place they tend to

391
00:14:02,199 --> 00:14:03,179
get pretty expensive

392
00:14:03,559 --> 00:14:05,339
is if you're integrating

393
00:14:06,134 --> 00:14:06,634
these

394
00:14:07,095 --> 00:14:07,595
AIs

395
00:14:08,254 --> 00:14:08,754
into,

396
00:14:09,415 --> 00:14:11,815
like, your coding workflows, like, hey, you're you're

397
00:14:11,815 --> 00:14:12,855
out there and you're sitting there and you're

398
00:14:12,855 --> 00:14:15,095
like, I want a vibe code. Well, great.

399
00:14:15,095 --> 00:14:17,654
When you're like vibe coding across 10,000 lines

400
00:14:17,654 --> 00:14:18,929
of code, it starts

401
00:14:19,309 --> 00:14:21,790
to add up and get pretty expensive. So

402
00:14:21,790 --> 00:14:24,350
you already bought this, you know, honking computer.

403
00:14:24,350 --> 00:14:26,370
You got a GPU. It's got CPU.

404
00:14:26,830 --> 00:14:28,990
It's got a fast disk. You might as

405
00:14:28,990 --> 00:14:30,850
well use it for a little bit more

406
00:14:31,154 --> 00:14:33,794
than just writing your PowerShell scripts. Like, why

407
00:14:33,794 --> 00:14:35,235
why are you sitting there writing in Versus

408
00:14:35,235 --> 00:14:37,074
Code by hand when, you know, you could

409
00:14:37,074 --> 00:14:39,074
be just vibing your way through that stuff?

410
00:14:39,074 --> 00:14:41,014
For sure. And I think that's one thing.

411
00:14:41,074 --> 00:14:42,834
I guess I kind of always realized it

412
00:14:42,834 --> 00:14:44,834
in the back of my head, comparing local

413
00:14:44,834 --> 00:14:45,334
LLM

414
00:14:45,679 --> 00:14:48,559
to JetGPT to Copilot to cloud based, it

415
00:14:48,559 --> 00:14:50,100
kinda struck me that

416
00:14:50,720 --> 00:14:53,120
from a pricing perspective, when you're using cloud

417
00:14:53,120 --> 00:14:55,759
based LLMs, you're not paying for the models.

418
00:14:55,759 --> 00:14:58,319
Like, these companies, these models are all out

419
00:14:58,319 --> 00:14:58,819
there,

420
00:14:59,120 --> 00:14:59,940
whether it's

421
00:15:00,424 --> 00:15:00,924
DeepSeek

422
00:15:01,945 --> 00:15:02,764
or Lama

423
00:15:03,144 --> 00:15:05,225
or any of those. What you're really paying

424
00:15:05,225 --> 00:15:06,764
for is the compute to

425
00:15:07,225 --> 00:15:09,784
process the request to these models, and that's

426
00:15:09,784 --> 00:15:11,225
where that cost comes in. Do you wanna

427
00:15:11,225 --> 00:15:13,625
spend it in on premises hardware and hardware

428
00:15:13,625 --> 00:15:15,790
running in your house, or are you giving

429
00:15:15,790 --> 00:15:18,350
it to these cloud providers for the hardware

430
00:15:18,350 --> 00:15:21,470
out there running models that maybe you don't

431
00:15:21,470 --> 00:15:24,429
physically have the capability of running on your

432
00:15:24,429 --> 00:15:26,269
compute that you own? It is an interesting

433
00:15:26,269 --> 00:15:28,190
one. The other thing that, you know, like,

434
00:15:28,190 --> 00:15:29,629
once you get a little bit more advanced

435
00:15:29,629 --> 00:15:30,910
and you start going down the path of

436
00:15:30,910 --> 00:15:32,664
some of this stuff, if you really get

437
00:15:32,664 --> 00:15:35,065
into it, you start looking at things like

438
00:15:35,065 --> 00:15:35,804
fine tuning

439
00:15:36,184 --> 00:15:39,304
and doing RAG or retrieval augmented generation against

440
00:15:39,304 --> 00:15:41,144
things. So we'll put a link in the

441
00:15:41,144 --> 00:15:44,105
show notes to a Network Chuck episode where

442
00:15:44,105 --> 00:15:46,610
he talks about running local LMs. And one

443
00:15:46,610 --> 00:15:47,970
of the things that he does, he has

444
00:15:47,970 --> 00:15:50,370
this really interesting use case where when he

445
00:15:50,370 --> 00:15:53,110
attends church, all the sermons are transcribed,

446
00:15:53,649 --> 00:15:56,610
and he uses local LLMs to summarize the

447
00:15:56,610 --> 00:15:58,929
sermons for himself. Like, he doesn't always get

448
00:15:58,929 --> 00:16:00,769
to attend live, but he still wants to

449
00:16:00,769 --> 00:16:02,534
get the messaging out of it. So he

450
00:16:02,534 --> 00:16:05,095
does all that stuff like local LLM, and

451
00:16:05,095 --> 00:16:07,174
it's just all there ready to go. It

452
00:16:07,174 --> 00:16:08,154
does the transcription,

453
00:16:08,934 --> 00:16:10,855
like pulls it all off a YouTube thing,

454
00:16:10,855 --> 00:16:12,934
transcribes it, runs it through an LLM, gives

455
00:16:12,934 --> 00:16:14,730
him the summary, and then that summary is

456
00:16:14,730 --> 00:16:17,289
written back as a markdown file where it

457
00:16:17,289 --> 00:16:18,269
lands in Obsidian,

458
00:16:18,809 --> 00:16:21,289
and then he can just use his network

459
00:16:21,289 --> 00:16:23,450
brain in Obsidian to go and figure some

460
00:16:23,450 --> 00:16:25,049
of that stuff out too. So you can

461
00:16:25,049 --> 00:16:27,164
get pretty rich with these things if you

462
00:16:27,164 --> 00:16:29,644
start to kinda, run through the use cases.

463
00:16:29,644 --> 00:16:31,804
So we're, like, Network Chuck might be doing

464
00:16:31,965 --> 00:16:34,044
I might be working on coding a new

465
00:16:34,044 --> 00:16:34,544
application,

466
00:16:35,004 --> 00:16:37,004
and I just want it to learn off

467
00:16:37,004 --> 00:16:39,009
maybe an existing code base from, like, the

468
00:16:39,009 --> 00:16:41,029
previous two versions or iterations

469
00:16:41,409 --> 00:16:43,089
or things like that that I did along

470
00:16:43,089 --> 00:16:44,529
the way. So you can also do these

471
00:16:44,529 --> 00:16:46,850
things like fine tuning and get up and

472
00:16:46,850 --> 00:16:47,350
running

473
00:16:47,889 --> 00:16:50,529
pretty pretty quickly. It's actually, like, turns out

474
00:16:50,529 --> 00:16:51,970
a lot of the tooling's already out there.

475
00:16:51,970 --> 00:16:53,350
Like, these things are

476
00:16:53,730 --> 00:16:56,115
not the hardest thing to stand up. But

477
00:16:56,115 --> 00:16:57,875
before we stand them up, we should also

478
00:16:57,875 --> 00:16:59,955
probably talk a little bit about, like, what

479
00:16:59,955 --> 00:17:02,455
kinds of models you can run

480
00:17:02,914 --> 00:17:05,494
because your mileage may vary here based on

481
00:17:05,795 --> 00:17:08,375
your your hardware and what's available to you,

482
00:17:08,650 --> 00:17:11,049
your your network bandwidths, and a couple other

483
00:17:11,049 --> 00:17:11,549
things.

484
00:17:15,450 --> 00:17:17,690
Do you feel overwhelmed by trying to manage

485
00:17:17,690 --> 00:17:19,929
your Office three sixty five environment? Are you

486
00:17:19,929 --> 00:17:23,230
facing unexpected issues that disrupt your company's productivity?

487
00:17:23,529 --> 00:17:25,474
Intelligink is here to help. Much like you

488
00:17:25,474 --> 00:17:27,394
take your car to the mechanic that has

489
00:17:27,394 --> 00:17:29,474
specialized knowledge on how to best keep your

490
00:17:29,474 --> 00:17:32,454
car running, Intelligink helps you with your Microsoft

491
00:17:32,515 --> 00:17:34,774
cloud environment because that's their expertise.

492
00:17:35,154 --> 00:17:37,470
Intelligink keeps up with the latest updates in

493
00:17:37,470 --> 00:17:39,630
the Microsoft cloud to help keep your business

494
00:17:39,630 --> 00:17:41,869
running smoothly and ahead of the curve. Whether

495
00:17:41,869 --> 00:17:43,869
you are a small organization with just a

496
00:17:43,869 --> 00:17:46,349
few users up to an organization of several

497
00:17:46,349 --> 00:17:47,329
thousand employees,

498
00:17:47,710 --> 00:17:49,710
they want to partner with you to implement

499
00:17:49,710 --> 00:17:52,450
and administer your Microsoft cloud technology.

500
00:17:53,204 --> 00:17:56,744
Visit them at inteliginc.com/podcast.

501
00:17:56,964 --> 00:18:03,704
That's intelligink.com/podcast

502
00:18:04,085 --> 00:18:06,244
for more information or to schedule a thirty

503
00:18:06,244 --> 00:18:08,240
minute call to get started with them today.

504
00:18:08,539 --> 00:18:11,900
Remember, Intelligink focuses on the Microsoft cloud so

505
00:18:11,900 --> 00:18:13,680
you can focus on your business.

506
00:18:15,820 --> 00:18:17,900
So talking hardware, do you wanna drive into

507
00:18:17,900 --> 00:18:20,545
hardware or models? Where should we go? It's

508
00:18:20,545 --> 00:18:21,924
kinda like a both conversation.

509
00:18:22,305 --> 00:18:23,744
So I think we can cover kind of

510
00:18:23,744 --> 00:18:24,725
the whole parameterization

511
00:18:25,505 --> 00:18:26,005
question

512
00:18:26,545 --> 00:18:29,105
and how big these things are to run

513
00:18:29,105 --> 00:18:29,605
locally

514
00:18:29,984 --> 00:18:31,605
along with some of the hardware

515
00:18:31,904 --> 00:18:32,404
constraints

516
00:18:32,865 --> 00:18:33,365
that

517
00:18:33,769 --> 00:18:35,929
come along with them. So when you think

518
00:18:35,929 --> 00:18:37,849
about the models that you can run, one

519
00:18:37,849 --> 00:18:39,849
of the first things that's gonna happen is

520
00:18:39,849 --> 00:18:42,269
you might go out and grab Ollama,

521
00:18:42,569 --> 00:18:45,129
you might grab LM Studio. You're you're gonna

522
00:18:45,129 --> 00:18:47,845
grab some system that's going to let you

523
00:18:47,845 --> 00:18:48,345
basically

524
00:18:48,884 --> 00:18:51,684
run that model and be able to run

525
00:18:51,684 --> 00:18:54,644
prompts against it. So though those models are

526
00:18:54,644 --> 00:18:55,704
gonna have different

527
00:18:56,005 --> 00:18:56,505
sizes,

528
00:18:57,044 --> 00:19:00,299
and those sizes equate back to parameters. So

529
00:19:00,460 --> 00:19:01,580
you're gonna go out and you're gonna see

530
00:19:01,580 --> 00:19:04,320
things like, oh, I wanna run llama three.

531
00:19:04,940 --> 00:19:06,960
And llama three might have,

532
00:19:07,660 --> 00:19:10,460
you know, a 7,000,000,000 parameter model. It might

533
00:19:10,460 --> 00:19:12,380
have a 300,000,000,000

534
00:19:12,380 --> 00:19:15,144
parameter model. It could have a 1,000,000,000 parameter.

535
00:19:15,144 --> 00:19:16,985
It could have something that's even smaller than

536
00:19:16,985 --> 00:19:19,144
that. So these things start to kind of

537
00:19:19,144 --> 00:19:21,384
become important. So if you're thinking about, like,

538
00:19:21,384 --> 00:19:23,865
parameters, number of parameters in a model, which

539
00:19:23,865 --> 00:19:25,404
is going to equate to

540
00:19:25,705 --> 00:19:26,924
kind of functionality

541
00:19:27,384 --> 00:19:28,445
within that model,

542
00:19:28,840 --> 00:19:31,400
In some place like a 7,000,000,000 parameter model,

543
00:19:31,400 --> 00:19:33,480
if you're looking at, like, LAMA two seven

544
00:19:33,480 --> 00:19:36,600
b, you're looking at Mistral seven b, like,

545
00:19:36,600 --> 00:19:38,519
those are pretty good starting points, and you

546
00:19:38,519 --> 00:19:41,160
don't need a super monster laptop or desktop

547
00:19:41,160 --> 00:19:43,184
to do it, just something decent. So if

548
00:19:43,184 --> 00:19:45,585
you have about 16 gigs of RAM and

549
00:19:45,585 --> 00:19:47,904
some CPU, you're good. Like, you don't need

550
00:19:47,904 --> 00:19:50,464
a dedicated GPU. You can absolutely do this

551
00:19:50,464 --> 00:19:51,444
stuff on CPU.

552
00:19:51,904 --> 00:19:54,144
I hesitate to say fast. It'll be fast

553
00:19:54,144 --> 00:19:56,150
ish. It might feel a little bit slow,

554
00:19:56,150 --> 00:19:58,069
like you'll see, like, the words typing out

555
00:19:58,069 --> 00:20:00,230
on screen, but that's okay. That that kind

556
00:20:00,230 --> 00:20:01,750
of equates to the experience that you might

557
00:20:01,750 --> 00:20:03,589
have in a chat GPT or or a

558
00:20:03,589 --> 00:20:05,369
Claude or things like that.

559
00:20:05,829 --> 00:20:07,450
But they're also super lightweight.

560
00:20:07,845 --> 00:20:10,565
So you you can get models that potentially

561
00:20:10,565 --> 00:20:12,404
when you download the model, they're measured in,

562
00:20:12,404 --> 00:20:13,704
like, hundreds of bags.

563
00:20:14,005 --> 00:20:15,845
Some are in the gigabyte range. Like, if

564
00:20:15,845 --> 00:20:17,224
you're in, like, a 7,000,000,000

565
00:20:17,444 --> 00:20:19,605
parameter model, you're talking about maybe, like, two

566
00:20:19,605 --> 00:20:22,390
to three gigs of downloading a quantized model

567
00:20:22,529 --> 00:20:25,569
and being able to track against it. And

568
00:20:25,569 --> 00:20:28,470
with 7,000,000,000 parameters, you'll probably find

569
00:20:28,769 --> 00:20:31,910
that they're good enough for most tasks,

570
00:20:32,529 --> 00:20:33,109
for most

571
00:20:33,464 --> 00:20:36,744
personal tasks. Hey. Summarize this for me. Hey.

572
00:20:36,744 --> 00:20:38,444
Give me a quick idea of this.

573
00:20:38,904 --> 00:20:41,544
Translate this to this. Like, those kinds of

574
00:20:41,544 --> 00:20:43,644
things, it's perfect. Hey. I wanna pump in

575
00:20:43,865 --> 00:20:46,424
the transcript from a YouTube video and have

576
00:20:46,424 --> 00:20:48,569
a local model summarize it for me. That's

577
00:20:48,569 --> 00:20:50,970
an awesome job for, like, a 3,000,000,007

578
00:20:50,970 --> 00:20:52,029
parameter model,

579
00:20:52,409 --> 00:20:54,169
things like that. You can get a little

580
00:20:54,169 --> 00:20:56,569
bit bigger, and a little bit bigger is

581
00:20:56,569 --> 00:20:58,809
typically gonna be in the something of, like,

582
00:20:58,809 --> 00:21:00,109
10 to 30,000,000,000

583
00:21:00,169 --> 00:21:01,230
parameter range.

584
00:21:01,755 --> 00:21:02,255
So

585
00:21:02,634 --> 00:21:04,815
now you're getting a little bit more honking.

586
00:21:04,954 --> 00:21:07,994
You're actually gonna need some GPU here, and

587
00:21:07,994 --> 00:21:10,234
you're probably gonna need more RAM as well.

588
00:21:10,234 --> 00:21:11,914
So, like, 16 gigs of RAM isn't gonna

589
00:21:11,914 --> 00:21:13,994
cut it. You're probably gonna need something closer

590
00:21:13,994 --> 00:21:15,615
to 32 gigs of RAM.

591
00:21:16,000 --> 00:21:18,480
You're gonna need some kind of GPU to

592
00:21:18,480 --> 00:21:19,380
drive that.

593
00:21:20,000 --> 00:21:21,359
You know, I think you could maybe get

594
00:21:21,359 --> 00:21:23,919
by on, like, an RTX thirty ninety or

595
00:21:23,919 --> 00:21:25,759
something like that. You'd probably wanna be in,

596
00:21:25,759 --> 00:21:27,519
like, a a a 40 series, like, a

597
00:21:27,519 --> 00:21:29,975
forty sixty, 40 70. Or if you're all

598
00:21:29,975 --> 00:21:31,174
on board and, like I said, you're a

599
00:21:31,174 --> 00:21:33,095
PC gamer and you've got that fifty ninety

600
00:21:33,095 --> 00:21:35,674
sitting in there, like, go ahead. Use it.

601
00:21:35,815 --> 00:21:37,575
It's ready to go. Nobody has the 50

602
00:21:37,575 --> 00:21:39,095
series. There were only, like, 10 of them

603
00:21:39,095 --> 00:21:40,695
produced and nobody could buy them. Well, and

604
00:21:40,695 --> 00:21:41,975
out of the 10 that were produced, 10

605
00:21:41,975 --> 00:21:43,894
out of 10 were broken, so the the

606
00:21:43,894 --> 00:21:46,049
yields are great. And melted power cables. Okay.

607
00:21:46,049 --> 00:21:48,210
Anyways, sidetracked. Yes. But you're gonna need one

608
00:21:48,210 --> 00:21:49,890
of those high end GPUs. Yeah. Well, you're

609
00:21:49,890 --> 00:21:51,730
gonna need a GPU. Like, I think the

610
00:21:51,730 --> 00:21:54,369
difference between, like, a 3,000,000,000, seven parameter model

611
00:21:54,369 --> 00:21:55,890
and then you get up to those, like,

612
00:21:55,890 --> 00:21:57,255
10 to 30 range

613
00:21:57,575 --> 00:21:59,494
is, do I need a GPU or do

614
00:21:59,494 --> 00:22:00,634
I not need a GPU?

615
00:22:00,934 --> 00:22:02,775
So you can do the smaller models just

616
00:22:02,775 --> 00:22:04,535
with CPU as long as you have enough

617
00:22:04,535 --> 00:22:06,934
RAM. At some point, you're gonna want GPU

618
00:22:06,934 --> 00:22:10,234
as well to go ahead and offload those.

619
00:22:10,460 --> 00:22:12,220
So if you're thinking like, hey, my use

620
00:22:12,220 --> 00:22:14,880
case for running a local LM is doing

621
00:22:15,019 --> 00:22:17,759
advanced coding, like, I'm I'm beyond, like, summarization,

622
00:22:17,900 --> 00:22:19,579
and I want this thing to help me

623
00:22:19,579 --> 00:22:20,319
write applications,

624
00:22:20,619 --> 00:22:23,924
PowerShell scripts, bash scripts, anything like that, you're

625
00:22:24,085 --> 00:22:26,244
probably gonna wanna be in that range where

626
00:22:26,244 --> 00:22:28,244
you've got a little bit more RAM and

627
00:22:28,244 --> 00:22:29,225
you've got a GPU,

628
00:22:29,765 --> 00:22:31,365
and then you kinda find the model that

629
00:22:31,365 --> 00:22:33,205
you like, and and that ends up being

630
00:22:33,205 --> 00:22:35,605
your sweet spot there. After that, you get

631
00:22:35,605 --> 00:22:37,700
into, like, the big, big models. So you're

632
00:22:37,700 --> 00:22:40,099
into, like, 65. I think, I was watching

633
00:22:40,099 --> 00:22:42,099
another NetworkChuck video. He ran one on a

634
00:22:42,099 --> 00:22:42,599
cluster

635
00:22:42,900 --> 00:22:44,980
of Those studios. I think it was the

636
00:22:44,980 --> 00:22:46,900
m one studios. It was, like, a cluster

637
00:22:46,900 --> 00:22:48,500
of, like, six of those where he was

638
00:22:48,500 --> 00:22:50,660
able to run, like, a 400,000,000,000 parameter model,

639
00:22:50,660 --> 00:22:52,599
but it was only able to output context

640
00:22:53,164 --> 00:22:54,544
you know, like one

641
00:22:55,085 --> 00:22:55,585
word,

642
00:22:55,884 --> 00:22:58,684
a second. Like, it's just so slow that

643
00:22:58,684 --> 00:23:01,005
it's that it's not actually useful. Right. So

644
00:23:01,005 --> 00:23:02,605
slow. A few times it looked like it

645
00:23:02,605 --> 00:23:04,625
even got stuck and,

646
00:23:05,005 --> 00:23:07,940
yeah, it was it was interesting. We'll put

647
00:23:07,940 --> 00:23:08,980
a link to that video in the show

648
00:23:08,980 --> 00:23:10,340
notes too. Yeah. So the way I think

649
00:23:10,340 --> 00:23:13,400
about that, the really big models, they're basically

650
00:23:13,940 --> 00:23:16,100
not there for, like, the faint of heart.

651
00:23:16,100 --> 00:23:17,460
They're there if you know what you're doing,

652
00:23:17,460 --> 00:23:19,619
if you've got the hardware to back it,

653
00:23:19,619 --> 00:23:20,440
both CPU,

654
00:23:20,980 --> 00:23:21,480
RAM,

655
00:23:22,394 --> 00:23:23,295
and and GPU.

656
00:23:23,755 --> 00:23:25,275
So if you think about it, like, there's

657
00:23:25,275 --> 00:23:26,394
kinda like a way that you can just

658
00:23:26,394 --> 00:23:27,755
break it down into a simple set of,

659
00:23:27,755 --> 00:23:29,755
like, pros and cons. So when you're sitting

660
00:23:29,755 --> 00:23:32,234
out there, you're in that, like, three, five,

661
00:23:32,234 --> 00:23:33,535
seven billion range,

662
00:23:34,075 --> 00:23:36,269
that's gonna be fast. You can do it

663
00:23:36,269 --> 00:23:37,649
on simple low hardware,

664
00:23:37,950 --> 00:23:39,470
or you can even do it on beefier

665
00:23:39,470 --> 00:23:41,069
hardware. Like in my case, like when I'm

666
00:23:41,069 --> 00:23:43,710
on my M1 Max, typically, I'm also running

667
00:23:43,710 --> 00:23:46,109
Windows in a virtual machine. So that's typically

668
00:23:46,109 --> 00:23:48,109
got half my RAM already. And then I've

669
00:23:48,109 --> 00:23:49,470
got a little bit of RAM that's going

670
00:23:49,470 --> 00:23:50,714
to the OS and things like that as

671
00:23:50,714 --> 00:23:52,474
well. So even if I could run a

672
00:23:52,474 --> 00:23:55,194
bigger model, I'm not going to because I'm

673
00:23:55,194 --> 00:23:57,755
still having resource contention and other things. Like,

674
00:23:57,755 --> 00:23:59,194
sometimes I don't wanna shut down my VM

675
00:23:59,194 --> 00:24:00,634
or I don't wanna shut down Versus Code

676
00:24:00,634 --> 00:24:02,154
because I'm I'm using those things. Right. You

677
00:24:02,154 --> 00:24:03,980
know, smaller models, fast,

678
00:24:04,359 --> 00:24:05,420
commodity hardware,

679
00:24:06,119 --> 00:24:07,500
good enough for

680
00:24:07,799 --> 00:24:11,000
easy tasks. Like, sum summarize that transcript for

681
00:24:11,000 --> 00:24:13,500
me thing, they're gonna be great for that.

682
00:24:13,559 --> 00:24:15,640
You get into that middle range, probably your

683
00:24:15,640 --> 00:24:16,920
sweet spot, like, if you do have a

684
00:24:16,920 --> 00:24:18,779
little GPU to drive these things,

685
00:24:19,125 --> 00:24:19,865
good accuracy,

686
00:24:20,244 --> 00:24:21,545
more context awareness,

687
00:24:22,164 --> 00:24:25,125
and kinda longer context windows. So as you're

688
00:24:25,125 --> 00:24:26,984
chatting with these things, they can remember,

689
00:24:27,285 --> 00:24:29,684
quote, unquote, big air quotes here. They can

690
00:24:29,684 --> 00:24:32,149
remember what you previously typed with them. So

691
00:24:32,149 --> 00:24:34,710
having bigger context windows and and more RAM

692
00:24:34,710 --> 00:24:36,869
and VRAM from your GPUs to host those

693
00:24:36,869 --> 00:24:39,529
context windows in becomes a little bit important.

694
00:24:39,750 --> 00:24:41,210
And then, like, if you're,

695
00:24:41,990 --> 00:24:43,829
you know, a monster gamer, you've got just

696
00:24:43,829 --> 00:24:45,190
a bunch of these things laying around and

697
00:24:45,190 --> 00:24:47,414
you wanna network them all together, it's super

698
00:24:47,414 --> 00:24:48,774
easy to do that too if you got

699
00:24:48,774 --> 00:24:50,154
enough hardware running around,

700
00:24:50,615 --> 00:24:52,075
and and you can go and,

701
00:24:52,855 --> 00:24:55,335
make that happen. So once you've kinda figured

702
00:24:55,335 --> 00:24:57,575
out your your hardware and you've got a

703
00:24:57,575 --> 00:24:59,014
sense for what you wanna do and what

704
00:24:59,014 --> 00:25:01,190
you're gonna be able to run locally, well,

705
00:25:01,250 --> 00:25:03,349
then you need a way to

706
00:25:03,730 --> 00:25:04,470
run these

707
00:25:04,849 --> 00:25:05,349
things

708
00:25:05,809 --> 00:25:08,289
locally, which, you know, it's not a little

709
00:25:08,289 --> 00:25:10,210
decision to make. Right. And another thing about

710
00:25:10,210 --> 00:25:12,529
the hardware that I found interesting watching the

711
00:25:12,529 --> 00:25:15,644
NetworkChuck videos as well was because we talked

712
00:25:15,644 --> 00:25:17,904
about the Macs, Macs have, like, that shared

713
00:25:17,965 --> 00:25:20,945
memory. They don't have dedicated video memory and

714
00:25:21,005 --> 00:25:23,404
system memory. So one thing he was doing

715
00:25:23,404 --> 00:25:25,325
was when he was running these models, like,

716
00:25:25,325 --> 00:25:28,200
all the memory was going to process the

717
00:25:28,200 --> 00:25:31,419
model because it doesn't have, like, those physical

718
00:25:31,480 --> 00:25:31,980
boundaries

719
00:25:32,279 --> 00:25:32,779
between

720
00:25:33,400 --> 00:25:35,000
physical and system memory. So I think that

721
00:25:35,000 --> 00:25:36,679
was another thing to watch out for. And

722
00:25:36,679 --> 00:25:38,539
the other thing, because you mentioned networking,

723
00:25:38,919 --> 00:25:40,759
he also found, like, running a 10 gig

724
00:25:40,759 --> 00:25:43,115
network. Something I didn't realize because I've never

725
00:25:43,115 --> 00:25:45,515
done this locally, how chatty these are if

726
00:25:45,515 --> 00:25:47,674
you're running a cluster over a network. Super

727
00:25:47,674 --> 00:25:50,815
chatty. He'd, like, saturated his 10 gig network,

728
00:25:51,115 --> 00:25:53,355
and that appeared I would say, I don't

729
00:25:53,355 --> 00:25:55,035
know that it was definitive in his videos,

730
00:25:55,035 --> 00:25:56,494
but appeared to be the bottleneck

731
00:25:56,849 --> 00:25:57,670
using these,

732
00:25:58,450 --> 00:25:59,429
clustered studios.

733
00:25:59,730 --> 00:26:02,609
So then he switched to Thunderbolt, which gave

734
00:26:02,609 --> 00:26:05,490
him a 40 gig network essentially. And even

735
00:26:05,490 --> 00:26:07,809
that, he managed to saturate, get a little

736
00:26:07,809 --> 00:26:09,829
bit more speed out of it using Thunderbolt

737
00:26:09,890 --> 00:26:12,194
as opposed to a 10 gig network. But

738
00:26:12,194 --> 00:26:14,534
if you do start thinking of clustering

739
00:26:14,994 --> 00:26:15,894
larger models,

740
00:26:16,274 --> 00:26:19,075
networking is also huge when it comes into

741
00:26:19,075 --> 00:26:20,994
the hardware for these things. I don't really

742
00:26:20,994 --> 00:26:23,234
get into the network model kind of thing.

743
00:26:23,234 --> 00:26:25,075
Like, I just don't have enough hardware running

744
00:26:25,075 --> 00:26:26,890
around here at home to do it. I

745
00:26:26,890 --> 00:26:29,130
I certainly think it's interesting if you can

746
00:26:29,130 --> 00:26:31,450
get there. So, yeah, we can kinda talk

747
00:26:31,450 --> 00:26:33,369
about that maybe with, like, more advanced stuff.

748
00:26:33,369 --> 00:26:35,609
Yes. So on to software. So you got

749
00:26:35,609 --> 00:26:38,009
your hardware. You got your software. I keep

750
00:26:38,009 --> 00:26:39,865
seeing you sent LM Studio. I've not looked

751
00:26:39,865 --> 00:26:41,865
at LM Studio. The one that always seems

752
00:26:41,865 --> 00:26:43,804
to pop up for me both in

753
00:26:44,105 --> 00:26:45,704
the Home Assistant as well as in a

754
00:26:45,704 --> 00:26:47,865
lot of the network check is Ollama for

755
00:26:47,865 --> 00:26:50,765
running these locally. Your decision here is

756
00:26:51,144 --> 00:26:53,144
how geeky do you wanna be and and

757
00:26:53,144 --> 00:26:54,204
what is your workflow?

758
00:26:54,679 --> 00:26:56,619
So if your primary workflow

759
00:26:57,079 --> 00:26:57,579
is

760
00:26:58,119 --> 00:26:59,259
you just want to

761
00:26:59,720 --> 00:27:02,119
chat with a a chatbot, like, you wanna

762
00:27:02,119 --> 00:27:03,960
hop right in, you wanna download a model,

763
00:27:03,960 --> 00:27:05,400
and you wanna be able to chat right

764
00:27:05,400 --> 00:27:07,480
away in, like, a nice GUI and a

765
00:27:07,480 --> 00:27:08,539
graphical interface,

766
00:27:08,924 --> 00:27:11,245
LM Studio is great for that. There's like,

767
00:27:11,245 --> 00:27:12,285
if you go out and you look this

768
00:27:12,285 --> 00:27:14,045
stuff up and you hop on Reddit or

769
00:27:14,045 --> 00:27:15,025
things like that,

770
00:27:15,485 --> 00:27:17,485
there there's going to be that set of

771
00:27:17,485 --> 00:27:19,825
folks out there who hate LM Studio

772
00:27:20,205 --> 00:27:22,369
because it's closed source,

773
00:27:22,670 --> 00:27:24,349
but, you know, I I'm just looking to

774
00:27:24,349 --> 00:27:26,190
play with these things. So for what I

775
00:27:26,190 --> 00:27:27,809
wanna do, it certainly

776
00:27:28,509 --> 00:27:29,009
works

777
00:27:29,309 --> 00:27:31,390
works great. Comes together, does what I need

778
00:27:31,390 --> 00:27:33,950
it to do. That said, you can also

779
00:27:33,950 --> 00:27:35,089
do Ollama,

780
00:27:35,695 --> 00:27:38,414
and Ollama is gonna be more command line

781
00:27:38,414 --> 00:27:40,734
driven, like you're gonna do more installations from

782
00:27:40,734 --> 00:27:42,335
the command line, you're even gonna download your

783
00:27:42,335 --> 00:27:44,355
models from the command line, so you're kinda

784
00:27:44,575 --> 00:27:46,975
trading off ease of use there. There's pros

785
00:27:46,975 --> 00:27:48,654
and cons to both depending on what you're

786
00:27:48,654 --> 00:27:50,639
doing. LM Studio is great if you just

787
00:27:50,639 --> 00:27:52,639
want to chat, you want to immediately have

788
00:27:52,639 --> 00:27:54,179
OpenAI spec ed endpoints

789
00:27:54,880 --> 00:27:57,039
exposed maybe to things like Versus Code locally,

790
00:27:57,039 --> 00:27:58,399
and you just don't want to wire anything

791
00:27:58,399 --> 00:27:59,919
up. You're looking for just like a one

792
00:27:59,919 --> 00:28:01,919
shot install, and you're going to be one

793
00:28:01,919 --> 00:28:03,244
and done. The other way you can do

794
00:28:03,244 --> 00:28:05,565
it is you can go to Ollama, and

795
00:28:05,565 --> 00:28:07,804
you can find your model that you wanna

796
00:28:07,804 --> 00:28:09,884
run on there. So, you know, I wanna

797
00:28:09,884 --> 00:28:12,524
run llama two seven billion, and you'll go

798
00:28:12,524 --> 00:28:14,764
download that, and you're gonna do all this

799
00:28:14,764 --> 00:28:16,919
from the command line. Now you wanna chat

800
00:28:16,919 --> 00:28:17,740
with that thing.

801
00:28:18,200 --> 00:28:18,700
Well,

802
00:28:19,000 --> 00:28:20,919
you can certainly chat with it from the

803
00:28:20,919 --> 00:28:23,480
command line. That that's totally a possibility. If

804
00:28:23,480 --> 00:28:25,659
if that's your jam or your jelly, awesome.

805
00:28:25,720 --> 00:28:27,880
Go for it. But if you want to

806
00:28:27,880 --> 00:28:29,559
chat with it in a GUI, now you

807
00:28:29,559 --> 00:28:31,424
gotta go install something else. Like, you might

808
00:28:31,424 --> 00:28:32,644
have to go install

809
00:28:33,184 --> 00:28:34,244
open web UI

810
00:28:34,704 --> 00:28:36,865
to to to get that piece going and

811
00:28:36,865 --> 00:28:38,944
and stand all that up. So it's not

812
00:28:38,944 --> 00:28:41,345
like it's hard to do. It's just your

813
00:28:41,345 --> 00:28:43,345
your flavor and and and where you sit

814
00:28:43,345 --> 00:28:44,565
and where you wanna land.

815
00:28:44,950 --> 00:28:46,309
You know, if I'm looking to just do

816
00:28:46,309 --> 00:28:48,150
things quickly and, like, I'm just in there

817
00:28:48,150 --> 00:28:50,549
to maybe, like, oh, hey. I see Microsoft

818
00:28:50,549 --> 00:28:52,970
released a new model for 05/04,

819
00:28:53,269 --> 00:28:55,590
and they they were they, you know, just

820
00:28:55,590 --> 00:28:57,590
pushed new models for 05/03 and '5 '4,

821
00:28:57,590 --> 00:28:59,690
and I I wanna compare those two things.

822
00:29:00,025 --> 00:29:02,265
I'll probably just spin those up in LM

823
00:29:02,265 --> 00:29:04,664
Studio. Super easy. Next, next, next my way

824
00:29:04,664 --> 00:29:06,025
through it. I don't have to remember a

825
00:29:06,025 --> 00:29:08,365
bunch of command line parameters, things like that.

826
00:29:08,424 --> 00:29:11,065
If I'm doing more like application development and

827
00:29:11,065 --> 00:29:13,059
I'm thinking about, like, hey. I want to

828
00:29:13,059 --> 00:29:14,340
stand this thing up. I wanna have it

829
00:29:14,340 --> 00:29:16,200
running in the background. I want some endpoints

830
00:29:16,259 --> 00:29:17,779
that are exposed. Maybe I can build, like,

831
00:29:17,779 --> 00:29:19,779
an app that's doing, like, some light rag

832
00:29:19,779 --> 00:29:21,700
or some fine tuning on top of it,

833
00:29:21,700 --> 00:29:23,460
and I've got, like, a Python script over

834
00:29:23,460 --> 00:29:24,920
here that needs to talk to the model.

835
00:29:25,059 --> 00:29:28,440
Awesome. Great. Like, that's that's where Ollama sits,

836
00:29:28,795 --> 00:29:31,595
and it has its space ready to go

837
00:29:31,595 --> 00:29:34,154
for you. So much like picking a model

838
00:29:34,154 --> 00:29:36,075
size, you're you're just doing a pros and

839
00:29:36,075 --> 00:29:37,355
cons and a little bit of a trade

840
00:29:37,355 --> 00:29:39,595
off thing. So Ollama, if you want a

841
00:29:39,595 --> 00:29:42,394
simple command line experience and you're comfortable at

842
00:29:42,394 --> 00:29:45,609
the terminal, go for it. Windows, macOS, Linux,

843
00:29:45,609 --> 00:29:47,930
it's all there. LM Studio, if you're not

844
00:29:47,930 --> 00:29:51,049
opposed to closed source and you just want

845
00:29:51,049 --> 00:29:52,809
a GUI from the start for all the

846
00:29:52,809 --> 00:29:55,390
things, for downloading, for chatting, for,

847
00:29:56,009 --> 00:29:58,934
all all that stuff. Again, macOS, Windows, Linux,

848
00:29:59,015 --> 00:30:01,335
ready to go. It's just closed source versus

849
00:30:01,335 --> 00:30:03,174
open source is really how I think about

850
00:30:03,174 --> 00:30:05,575
it. And then if you really do go

851
00:30:05,575 --> 00:30:08,055
down the Ollama path, you're probably gonna end

852
00:30:08,055 --> 00:30:09,735
up in a space where you wanna run

853
00:30:09,735 --> 00:30:12,089
a local chat UI, like a web based

854
00:30:12,490 --> 00:30:13,789
chatbot style thing,

855
00:30:14,169 --> 00:30:14,669
and

856
00:30:15,049 --> 00:30:17,609
then you'll just use something like Open Web

857
00:30:17,609 --> 00:30:19,849
UI for that. And, again, super easy to

858
00:30:19,849 --> 00:30:22,589
install. You're just basically hosting a little

859
00:30:22,970 --> 00:30:25,369
a little web server locally that knows how

860
00:30:25,369 --> 00:30:25,869
to

861
00:30:26,394 --> 00:30:29,454
chat with chat with that model. And then

862
00:30:29,674 --> 00:30:31,035
it could be a little bit different depending

863
00:30:31,035 --> 00:30:32,715
on like the extension tooling that you're going

864
00:30:32,715 --> 00:30:34,075
to use from there. So I talked about

865
00:30:34,075 --> 00:30:36,234
maybe like integrating Versus Code with one of

866
00:30:36,234 --> 00:30:36,974
these locally.

867
00:30:37,434 --> 00:30:38,974
So if you're doing

868
00:30:39,490 --> 00:30:41,730
Versus Code, you're gonna typically go grab an

869
00:30:41,730 --> 00:30:43,909
extension. So there's things like CodeGPT,

870
00:30:44,369 --> 00:30:45,909
there's continue dot dev,

871
00:30:46,210 --> 00:30:48,929
there's an Ollama extension, which can actually just

872
00:30:48,929 --> 00:30:50,869
talk natively to your Ollama endpoint.

873
00:30:51,329 --> 00:30:54,309
Or like I said, LM Studio exposes OpenAI

874
00:30:55,005 --> 00:30:55,505
compatible

875
00:30:56,045 --> 00:30:58,224
endpoints. So that's kind of a known, like,

876
00:30:58,845 --> 00:31:00,605
you know, web interface that you can throw

877
00:31:00,605 --> 00:31:02,464
a request at in a structured way,

878
00:31:02,765 --> 00:31:04,765
and it will respond in a in a

879
00:31:04,765 --> 00:31:06,464
way that most of the extensions

880
00:31:07,085 --> 00:31:09,265
are going to understand

881
00:31:09,644 --> 00:31:11,140
and get you ramped up for and ready

882
00:31:11,140 --> 00:31:13,240
to go with. Yeah, looking through this and

883
00:31:13,539 --> 00:31:14,900
most of the videos I saw, and again,

884
00:31:14,900 --> 00:31:17,539
were all Olamae, even the command line based

885
00:31:17,539 --> 00:31:18,839
looked really

886
00:31:19,299 --> 00:31:22,200
simple, lots of guides to just walk through,

887
00:31:22,579 --> 00:31:24,524
type this in, this is how you tie

888
00:31:24,524 --> 00:31:25,884
that in, this is how you go stand

889
00:31:25,884 --> 00:31:26,704
up the WebUI,

890
00:31:27,164 --> 00:31:29,825
point WebUI, to all of those.

891
00:31:30,204 --> 00:31:31,424
So none of this

892
00:31:31,964 --> 00:31:34,444
really seemed that complicated in everything I watched

893
00:31:34,444 --> 00:31:36,605
and, again, made me excited, like, I need

894
00:31:36,605 --> 00:31:38,980
to go try this out and go find

895
00:31:38,980 --> 00:31:40,279
a computer that I can

896
00:31:40,660 --> 00:31:43,220
absolutely bury with a model. See what I

897
00:31:43,220 --> 00:31:44,660
can do. See what damage I can do

898
00:31:44,660 --> 00:31:46,900
to my computer, Scott. It is not hard

899
00:31:46,900 --> 00:31:48,420
to do. So the other thing that you

900
00:31:48,420 --> 00:31:50,180
can do, if you're comfortable on the command

901
00:31:50,180 --> 00:31:52,954
line, there's another project out there that's called

902
00:31:52,954 --> 00:31:53,454
Fabric.

903
00:31:53,835 --> 00:31:55,775
So Fabric is kind of a

904
00:31:56,794 --> 00:31:59,454
it it allows you to easily network and

905
00:31:59,835 --> 00:32:02,075
distribute traffic across multiple nodes, but you can

906
00:32:02,075 --> 00:32:03,539
also do it on a single node. So

907
00:32:03,619 --> 00:32:05,299
So I was talking earlier about, like, that,

908
00:32:05,299 --> 00:32:07,940
you know, sermon summarization thing. Yep. And that's

909
00:32:07,940 --> 00:32:10,259
all based on Fabric. So Fabric, again, command

910
00:32:10,259 --> 00:32:12,660
line, it can run with local LLMs. It's

911
00:32:12,660 --> 00:32:13,720
a little kinda

912
00:32:14,100 --> 00:32:15,859
opaque for for how it does it. So,

913
00:32:15,859 --> 00:32:17,299
you know, make sure you download one of

914
00:32:17,299 --> 00:32:19,825
the the newer versions of it, And Fabric

915
00:32:19,825 --> 00:32:21,424
is all run from the command line as

916
00:32:21,424 --> 00:32:24,065
well. But then you can super easily integrate

917
00:32:24,065 --> 00:32:27,105
Fabric into things like bash scripts. So, like,

918
00:32:27,105 --> 00:32:29,505
I use it for the same thing. Like,

919
00:32:29,505 --> 00:32:31,264
if I think about the the podcast, I

920
00:32:31,264 --> 00:32:33,450
just have a bash script that runs Whispir

921
00:32:33,450 --> 00:32:35,230
locally. So Whispir is

922
00:32:36,089 --> 00:32:38,349
a speech to text model Yep. That OpenAI,

923
00:32:38,410 --> 00:32:39,929
and I can run that locally. Like, that

924
00:32:39,929 --> 00:32:41,929
runs on my hardware just fine. So I've

925
00:32:41,929 --> 00:32:43,529
just got a little bash script that takes

926
00:32:43,529 --> 00:32:45,884
that, generates the transcript, and then I just

927
00:32:45,964 --> 00:32:48,765
pipe the summaries out into Fabric to have

928
00:32:48,765 --> 00:32:49,585
those for myself

929
00:32:49,964 --> 00:32:51,644
in just my notes on the side. Right?

930
00:32:51,644 --> 00:32:53,164
Like, hey, here's the things we talked about

931
00:32:53,164 --> 00:32:55,825
and and how they're coming together. So

932
00:32:56,285 --> 00:32:58,684
very, very, very easy to get on with

933
00:32:58,684 --> 00:33:00,039
this stuff. And I think for most of

934
00:33:00,039 --> 00:33:01,559
our audience as well, like you folks are

935
00:33:01,559 --> 00:33:03,160
all comfortable on the command line. You don't

936
00:33:03,160 --> 00:33:04,680
need a GUI for this stuff. You can

937
00:33:04,680 --> 00:33:05,740
follow some instructions

938
00:33:06,119 --> 00:33:08,039
and wire these up. And we're not talking

939
00:33:08,039 --> 00:33:10,840
like super complicated things. We're basically talking the

940
00:33:10,840 --> 00:33:13,274
equivalent of like a brew or a chocolatey

941
00:33:13,274 --> 00:33:15,674
install or a Winget install, like just little

942
00:33:15,674 --> 00:33:17,194
one liners to get all this stuff up

943
00:33:17,194 --> 00:33:18,875
and running. Absolutely. You don't need to go

944
00:33:18,875 --> 00:33:21,115
write 50 line PowerShell scripts or pipe a

945
00:33:21,115 --> 00:33:24,174
bunch of things. It's really straightforward from everything

946
00:33:24,394 --> 00:33:26,474
I saw. Super easy to get up and

947
00:33:26,474 --> 00:33:28,559
going with that. I would say, like, the

948
00:33:28,559 --> 00:33:30,160
other thing you might wanna do a little

949
00:33:30,160 --> 00:33:30,980
bit is

950
00:33:31,440 --> 00:33:32,980
when you're exploring models.

951
00:33:33,440 --> 00:33:35,759
So if you go into, like, LM Studio

952
00:33:35,759 --> 00:33:37,920
and you're going through their model catalog or

953
00:33:37,920 --> 00:33:40,799
you're on, Ollama and you're exploring their model

954
00:33:40,799 --> 00:33:42,755
catalog, you might wanna just start with, like,

955
00:33:42,755 --> 00:33:45,474
some of the more popular ones to get

956
00:33:45,474 --> 00:33:47,954
up and running. So, you know, there there

957
00:33:47,954 --> 00:33:49,974
are differences between these things,

958
00:33:50,355 --> 00:33:52,115
you know, depending on what you're doing. Like,

959
00:33:52,115 --> 00:33:53,575
you can't go ask DeepSeek

960
00:33:54,119 --> 00:33:56,920
what happened in Tiananmen Square. Like, that is

961
00:33:56,920 --> 00:33:58,380
not programmed into that model,

962
00:33:58,839 --> 00:34:00,359
e even in the one that you you

963
00:34:00,359 --> 00:34:01,500
download and

964
00:34:01,799 --> 00:34:03,640
you run locally, but, you know, you can

965
00:34:03,640 --> 00:34:06,279
do that with, other stuff. So these models

966
00:34:06,279 --> 00:34:07,720
all vary. The other thing that you can

967
00:34:07,720 --> 00:34:09,000
do is you can go through the model

968
00:34:09,000 --> 00:34:09,500
catalogs,

969
00:34:09,855 --> 00:34:12,414
and you can find models that are purpose

970
00:34:12,414 --> 00:34:13,875
built for certain things.

971
00:34:14,335 --> 00:34:16,974
So there are models that are generated within

972
00:34:16,974 --> 00:34:19,215
these families. So you talk about, like, LAMA.

973
00:34:19,215 --> 00:34:21,215
There's gonna be versions of the LAMA model

974
00:34:21,215 --> 00:34:23,775
that are better for doing coding assistance things

975
00:34:23,775 --> 00:34:26,039
with it than there are for doing just

976
00:34:26,039 --> 00:34:28,460
straight one shot text summarization,

977
00:34:29,320 --> 00:34:30,619
stuff like that. So

978
00:34:30,920 --> 00:34:32,280
you you have to think through that a

979
00:34:32,280 --> 00:34:35,019
little bit too, like, just what's your workflow

980
00:34:35,480 --> 00:34:35,980
and

981
00:34:36,360 --> 00:34:38,300
what are you trying to

982
00:34:38,840 --> 00:34:39,579
get at

983
00:34:40,074 --> 00:34:41,054
along the way?

984
00:34:41,355 --> 00:34:42,954
And then be prepared for a little bit

985
00:34:42,954 --> 00:34:44,094
of latency

986
00:34:44,394 --> 00:34:46,315
and maybe differences in perf when you're running

987
00:34:46,315 --> 00:34:48,315
with these things. I think lots of people

988
00:34:48,315 --> 00:34:49,675
set out and they say, oh, I'm gonna

989
00:34:49,675 --> 00:34:50,875
be able to run that model locally, and

990
00:34:50,875 --> 00:34:52,315
it's gonna be so much faster because it

991
00:34:52,315 --> 00:34:53,594
doesn't need to go out and talk to

992
00:34:53,594 --> 00:34:55,430
the Internet. Like, it doesn't need to talk

993
00:34:55,430 --> 00:34:57,430
to Claude. It it it doesn't need to

994
00:34:57,430 --> 00:35:00,390
talk to chat GPT, anything like that. Yeah.

995
00:35:00,390 --> 00:35:03,110
Like, absolutely. You've eliminated the latency of that

996
00:35:03,110 --> 00:35:05,269
whole, like, request response thing having to traverse

997
00:35:05,269 --> 00:35:05,930
the Internet,

998
00:35:06,309 --> 00:35:08,309
but you still have to have the hardware

999
00:35:08,309 --> 00:35:10,150
that's capable of running this and standing it

1000
00:35:10,150 --> 00:35:12,224
all up. So you might wanna even, like,

1001
00:35:12,224 --> 00:35:13,744
play around before you integrate these things. Like,

1002
00:35:13,744 --> 00:35:15,764
if you're interested in, like, a coding workflow

1003
00:35:16,144 --> 00:35:18,304
with or integrating with Versus Code, things like

1004
00:35:18,304 --> 00:35:20,065
that, you'll probably wanna play around with the

1005
00:35:20,065 --> 00:35:21,744
the models a little bit locally to find

1006
00:35:21,744 --> 00:35:23,424
the one that's got the the sweet spot

1007
00:35:23,424 --> 00:35:24,724
for you based on

1008
00:35:25,089 --> 00:35:27,409
number of parameters, your hardware, things like that

1009
00:35:27,409 --> 00:35:29,089
before you go down the path of integrating

1010
00:35:29,089 --> 00:35:30,869
it in Versus Code and then being disappointed

1011
00:35:30,929 --> 00:35:32,929
that it's too slow or or things like

1012
00:35:32,929 --> 00:35:34,690
that. There's a lot of blogs out there

1013
00:35:34,690 --> 00:35:37,010
that'll just tell you, like, oh, running AI

1014
00:35:37,010 --> 00:35:39,889
locally, like, it's super fast. It's it's super

1015
00:35:39,889 --> 00:35:41,775
easy. It is super easy. It's not always

1016
00:35:41,775 --> 00:35:43,295
super fast. So you so you do have

1017
00:35:43,295 --> 00:35:44,815
to be prepared for that depending on your

1018
00:35:44,815 --> 00:35:47,135
hardware. Yeah. Along with the model, Scott, this

1019
00:35:47,135 --> 00:35:49,214
is another thing again, being fairly new to

1020
00:35:49,214 --> 00:35:50,355
this, have you

1021
00:35:50,734 --> 00:35:52,974
compared at all? Because another thing you can

1022
00:35:52,974 --> 00:35:55,139
run into is quantization of these models. Right?

1023
00:35:55,139 --> 00:35:57,299
And this is something else Network Chuck talked

1024
00:35:57,299 --> 00:35:59,139
about in one of his where some of

1025
00:35:59,139 --> 00:36:01,799
these larger models, they quantize.

1026
00:36:02,260 --> 00:36:03,699
I don't know if that's the word. They

1027
00:36:03,699 --> 00:36:06,579
quantize them down, and it sounds like it's

1028
00:36:06,579 --> 00:36:07,719
essentially taking

1029
00:36:08,184 --> 00:36:10,444
different aspects of the model. And inside

1030
00:36:10,904 --> 00:36:14,025
models, they have model weights with, like, 32

1031
00:36:14,025 --> 00:36:16,424
bit precision, and they reduce these down to

1032
00:36:16,424 --> 00:36:18,444
eight bit, four bit precision,

1033
00:36:18,904 --> 00:36:20,984
which makes them not as accurate but makes

1034
00:36:20,984 --> 00:36:23,644
them smaller so you can run a

1035
00:36:24,239 --> 00:36:26,320
larger model. Some of those bigger ones we

1036
00:36:26,320 --> 00:36:28,179
talked about like 65,000,000,000

1037
00:36:28,639 --> 00:36:29,460
plus parameters

1038
00:36:30,159 --> 00:36:31,300
on less hardware,

1039
00:36:31,920 --> 00:36:33,380
but with more

1040
00:36:34,239 --> 00:36:35,219
not the accuracy

1041
00:36:35,679 --> 00:36:38,420
versus running maybe a model with less parameters,

1042
00:36:38,719 --> 00:36:40,474
but you get the full

1043
00:36:40,855 --> 00:36:42,695
the full model weights in there where you're

1044
00:36:42,695 --> 00:36:45,195
running the 32 bit precision instead of quantasize

1045
00:36:45,255 --> 00:36:47,894
them down. Again, when you're downloading models, definitely

1046
00:36:47,894 --> 00:36:50,215
something to watch out for because if these

1047
00:36:50,215 --> 00:36:51,434
are quantasized

1048
00:36:52,630 --> 00:36:54,389
and they have smaller, they can be less

1049
00:36:54,389 --> 00:36:56,389
accurate, you can run them. Like, have you

1050
00:36:56,389 --> 00:36:58,789
ever compared those of let's go run a

1051
00:36:58,789 --> 00:37:03,050
30,000,000,000 parameter model on local hardware versus a,

1052
00:37:03,829 --> 00:37:05,050
65,000,000,000

1053
00:37:05,109 --> 00:37:08,974
model or parameter model that's quantized down to

1054
00:37:08,974 --> 00:37:10,735
eight bit instead of 32 bit? I don't

1055
00:37:10,735 --> 00:37:13,295
think many folks are running 32 bit. Most

1056
00:37:13,295 --> 00:37:14,595
are probably running

1057
00:37:15,135 --> 00:37:17,855
Four bit. Some kind of like well, something

1058
00:37:17,855 --> 00:37:21,329
like 16 or lower, so like four, eight,

1059
00:37:21,730 --> 00:37:22,849
16. I think when you go out and,

1060
00:37:22,849 --> 00:37:24,710
like, you watch a lot of YouTube videos

1061
00:37:24,849 --> 00:37:25,590
and and,

1062
00:37:25,969 --> 00:37:27,250
you know, if if you do go down

1063
00:37:27,250 --> 00:37:28,369
this path and you start getting into it,

1064
00:37:28,369 --> 00:37:29,890
I think YouTube is a great place to

1065
00:37:29,890 --> 00:37:31,730
go to and start to see. You'll see

1066
00:37:31,730 --> 00:37:34,469
lots of people playing around with massive models,

1067
00:37:35,025 --> 00:37:36,484
but with a

1068
00:37:37,025 --> 00:37:39,105
like, only, like, four bits. Right. So they're

1069
00:37:39,105 --> 00:37:40,864
doing that just so they can run it,

1070
00:37:40,864 --> 00:37:42,704
not so they can run it effectively to

1071
00:37:42,704 --> 00:37:44,704
drive a workflow. Like, they're just trying to

1072
00:37:44,704 --> 00:37:46,304
try it out and see how many tokens

1073
00:37:46,304 --> 00:37:47,744
a second they can get out of it

1074
00:37:47,744 --> 00:37:49,599
or something like that. So a four bit

1075
00:37:49,599 --> 00:37:50,820
model is

1076
00:37:51,440 --> 00:37:52,980
absolutely going to

1077
00:37:53,280 --> 00:37:56,019
run on, like, consumer grade GPUs, CPUs.

1078
00:37:56,800 --> 00:37:58,320
Like, you're gonna be all good, ready to

1079
00:37:58,320 --> 00:38:00,260
go there, but you have to know that

1080
00:38:00,320 --> 00:38:03,295
it's been extremely compressed. So it can get

1081
00:38:03,295 --> 00:38:05,394
it down to a smaller download size,

1082
00:38:05,695 --> 00:38:08,574
and thus, it's going to take less memory

1083
00:38:08,574 --> 00:38:10,974
and less processing power to go ahead and

1084
00:38:10,974 --> 00:38:13,235
run it. So you might be running like,

1085
00:38:13,295 --> 00:38:14,894
you know, like if I think about, like,

1086
00:38:14,894 --> 00:38:17,074
the transformer that's running in iOS,

1087
00:38:17,489 --> 00:38:18,389
that's probably

1088
00:38:19,090 --> 00:38:20,769
a a a four bit model. Right? Like,

1089
00:38:20,769 --> 00:38:22,849
it's sitting there. It's running on commodity hardware

1090
00:38:22,929 --> 00:38:24,690
Right. And it's just doing what it needs

1091
00:38:24,690 --> 00:38:27,269
to do. Now, if I'm on my desktop

1092
00:38:27,409 --> 00:38:29,110
or or my m one

1093
00:38:29,565 --> 00:38:31,585
MacBook, you know, I might be thinking about

1094
00:38:31,964 --> 00:38:33,964
an eight bit model, and I'm okay with

1095
00:38:33,964 --> 00:38:36,045
the performance trade off. Like, I'm I'm okay

1096
00:38:36,045 --> 00:38:37,804
if it chats with me at, like, you

1097
00:38:37,804 --> 00:38:40,364
know, like, two tokens a second kinda thing.

1098
00:38:40,364 --> 00:38:42,659
Like, it can be super slow. It's it's

1099
00:38:42,659 --> 00:38:44,500
okay. But you're not gonna run these, like,

1100
00:38:44,500 --> 00:38:48,039
massive models because those are absolutely running in

1101
00:38:48,099 --> 00:38:48,599
those

1102
00:38:49,059 --> 00:38:51,619
massive data centers and and that set of

1103
00:38:51,619 --> 00:38:54,099
infrastructure. Like, I I just wanna be clear.

1104
00:38:54,099 --> 00:38:56,339
Like, you can't do the things that, like,

1105
00:38:56,339 --> 00:38:56,839
ChatGPT

1106
00:38:57,139 --> 00:38:58,875
can do with, like, o one running in

1107
00:38:58,875 --> 00:38:59,695
their data center

1108
00:38:59,994 --> 00:39:01,755
locally at your house. Like, that's just not

1109
00:39:01,755 --> 00:39:03,434
the way these things work. It's it's not

1110
00:39:03,434 --> 00:39:05,114
how they come together. So if you think

1111
00:39:05,114 --> 00:39:07,675
about, like, the the whole quantization thing, it's

1112
00:39:07,675 --> 00:39:08,414
all about

1113
00:39:08,715 --> 00:39:11,820
packing things down and basically, like, archiving them,

1114
00:39:11,820 --> 00:39:13,420
right? Put a tar or zip together of

1115
00:39:13,420 --> 00:39:13,920
this

1116
00:39:14,220 --> 00:39:17,039
thing and reduce the size, reduce the computational

1117
00:39:17,340 --> 00:39:17,840
requirements,

1118
00:39:18,780 --> 00:39:21,019
all that kind of stuff. So you're going

1119
00:39:21,019 --> 00:39:23,180
to get small models. Hey, that's great. They're

1120
00:39:23,180 --> 00:39:24,480
going to use less memory,

1121
00:39:24,974 --> 00:39:26,414
and you might be able to run a

1122
00:39:26,414 --> 00:39:28,655
larger model. Like, you could run a four

1123
00:39:28,655 --> 00:39:31,934
bit, you know, 30,000,000,000 parameter model, but it's

1124
00:39:31,934 --> 00:39:34,575
gonna be less accurate. And is accuracy important

1125
00:39:34,575 --> 00:39:36,255
to you? Well, you might wanna go to,

1126
00:39:36,255 --> 00:39:39,075
like, an eight bit like, 7,000,000,000 parameter model,

1127
00:39:39,690 --> 00:39:41,369
something like that. So it it's gonna be

1128
00:39:41,369 --> 00:39:43,150
very dependent on, like, your workflow

1129
00:39:43,609 --> 00:39:44,109
and

1130
00:39:44,570 --> 00:39:45,070
your

1131
00:39:45,449 --> 00:39:47,150
use case for these things.

1132
00:39:47,449 --> 00:39:49,130
I think the biggest thing you miss out

1133
00:39:49,130 --> 00:39:50,510
on is accuracy.

1134
00:39:51,210 --> 00:39:53,164
So, you know, like, if I'm summarizing

1135
00:39:53,625 --> 00:39:55,864
the podcast transcripts, I want those to be

1136
00:39:55,864 --> 00:39:57,465
kind of accurate. Like, I I don't want

1137
00:39:57,465 --> 00:39:59,305
them to just be hallucinating all over the

1138
00:39:59,305 --> 00:39:59,805
place.

1139
00:40:00,344 --> 00:40:00,844
But,

1140
00:40:01,305 --> 00:40:04,344
you know, if I'm doing something else, like,

1141
00:40:04,344 --> 00:40:06,744
hey. Help me write a poem about, you

1142
00:40:06,744 --> 00:40:09,309
know, iPads. Like, whatever. Do it with all

1143
00:40:09,309 --> 00:40:10,210
the least accuracy

1144
00:40:10,829 --> 00:40:12,289
that you want out there

1145
00:40:12,750 --> 00:40:14,829
along the way. I think the most common

1146
00:40:14,829 --> 00:40:16,269
thing, like so the other thing you run

1147
00:40:16,269 --> 00:40:17,170
into with quantization

1148
00:40:17,710 --> 00:40:18,210
is

1149
00:40:18,589 --> 00:40:20,510
there there's a bunch of different methods for

1150
00:40:20,510 --> 00:40:21,764
this. So

1151
00:40:22,144 --> 00:40:22,644
there's

1152
00:40:23,025 --> 00:40:23,525
Q,

1153
00:40:23,985 --> 00:40:26,324
which is basically like four bit quantization.

1154
00:40:27,105 --> 00:40:28,885
There's another format called

1155
00:40:29,344 --> 00:40:32,304
g g u f. So that's kind of,

1156
00:40:32,304 --> 00:40:34,724
like, the standard for running these things

1157
00:40:35,099 --> 00:40:36,539
efficiently. So you'll see a lot of these

1158
00:40:36,539 --> 00:40:38,140
things when you go in like, what's the

1159
00:40:38,140 --> 00:40:40,380
format of the model? Oh, it's a g

1160
00:40:40,380 --> 00:40:41,660
g u f. I don't even know how

1161
00:40:41,660 --> 00:40:42,239
it's pronounced.

1162
00:40:42,539 --> 00:40:44,300
But, you know, you can go in and

1163
00:40:44,300 --> 00:40:45,440
and grab those things

1164
00:40:46,140 --> 00:40:48,174
and and figure those out. So you can

1165
00:40:48,174 --> 00:40:50,434
think of, like, quantization maybe as, like, another

1166
00:40:50,974 --> 00:40:52,815
weight that you can put on that scale

1167
00:40:52,815 --> 00:40:54,914
when you're trying to find that balance between

1168
00:40:55,534 --> 00:40:58,114
model size, parameter count, quantization,

1169
00:40:58,574 --> 00:41:00,809
and the hardware that you run and the

1170
00:41:00,809 --> 00:41:02,489
workload that you wanna do. So how does

1171
00:41:02,489 --> 00:41:04,090
that scale tip and where do you wanna

1172
00:41:04,090 --> 00:41:06,650
land? It just becomes an another consideration in

1173
00:41:06,650 --> 00:41:09,610
there for you. Sounds good. Anything else before

1174
00:41:09,610 --> 00:41:12,090
wrapping this episode up? So a couple things.

1175
00:41:12,090 --> 00:41:13,930
If folks haven't done this yet, like Go

1176
00:41:13,930 --> 00:41:16,074
do it. You should totally go out and

1177
00:41:16,074 --> 00:41:18,315
just try and play around with Ollama LM

1178
00:41:18,315 --> 00:41:18,815
Studio.

1179
00:41:19,114 --> 00:41:20,795
If you're already doing it today, come back

1180
00:41:20,795 --> 00:41:22,315
and give us some feedback. Let let us

1181
00:41:22,315 --> 00:41:24,635
know what you're using it for. I think

1182
00:41:24,635 --> 00:41:27,594
there's all sorts of interesting use cases for

1183
00:41:27,594 --> 00:41:29,880
this stuff. We're just getting Ben started on

1184
00:41:29,880 --> 00:41:31,800
his list. Let's make his list a lot

1185
00:41:31,800 --> 00:41:34,519
longer for things that he is missing out

1186
00:41:34,519 --> 00:41:36,679
in his life. Home assistant and AI. He

1187
00:41:36,679 --> 00:41:38,539
needs to do to run a

1188
00:41:38,920 --> 00:41:40,839
chat model locally. And then if you're doing

1189
00:41:40,839 --> 00:41:42,519
other things besides chat models, like I said,

1190
00:41:42,519 --> 00:41:45,265
there's the stable diffusions of the world, there's

1191
00:41:45,265 --> 00:41:48,065
image generation, there's whisper, there's all these other

1192
00:41:48,065 --> 00:41:50,625
things out there. I was very surprised at

1193
00:41:50,625 --> 00:41:52,704
how approachable they are. I always thought this

1194
00:41:52,704 --> 00:41:54,885
was going to be like mystical dark arts

1195
00:41:55,025 --> 00:41:57,184
and magic and not for mere mortals kind

1196
00:41:57,184 --> 00:41:59,400
of thing. It's very much for mere mortals,

1197
00:41:59,480 --> 00:42:01,799
Like, super easy to get started with, super

1198
00:42:01,799 --> 00:42:02,299
turnkey,

1199
00:42:02,679 --> 00:42:04,839
and I would guarantee that almost anybody who

1200
00:42:04,839 --> 00:42:06,920
listens to this podcast probably has the hardware

1201
00:42:06,920 --> 00:42:08,359
to run this stuff and make it happen.

1202
00:42:08,359 --> 00:42:10,940
I'm actually excited to go try this out

1203
00:42:11,000 --> 00:42:13,005
and play around with it. I did find

1204
00:42:13,005 --> 00:42:15,085
an article too on a Raspberry Pi cluster

1205
00:42:15,085 --> 00:42:16,364
for AI. I don't know if I'm gonna

1206
00:42:16,364 --> 00:42:18,204
try that or use an extra Mac mini

1207
00:42:18,204 --> 00:42:20,385
I have sitting around here to start, but

1208
00:42:20,444 --> 00:42:22,045
I, like you, I would love to hear

1209
00:42:22,045 --> 00:42:23,404
what other people are doing. If you are

1210
00:42:23,404 --> 00:42:25,404
running them locally, what are you using them

1211
00:42:25,404 --> 00:42:26,144
for locally,

1212
00:42:26,764 --> 00:42:27,824
different use cases,

1213
00:42:28,364 --> 00:42:29,699
where have you found a good place to

1214
00:42:29,699 --> 00:42:31,000
start, all the things.

1215
00:42:31,300 --> 00:42:33,380
So if you do want to join us

1216
00:42:33,380 --> 00:42:34,679
and discuss these things,

1217
00:42:34,980 --> 00:42:37,059
we need to redo our outro, Scott, because

1218
00:42:37,059 --> 00:42:38,659
I think that has changed. I think we

1219
00:42:38,659 --> 00:42:40,659
actually still have Twitter in it. Let's not

1220
00:42:40,659 --> 00:42:43,414
say Twitter. Let's say probably Blue Sky. Are

1221
00:42:43,414 --> 00:42:44,855
you more active on Blue Sky right now

1222
00:42:44,855 --> 00:42:46,375
than any other one? Pick one anyway. Anyone

1223
00:42:46,375 --> 00:42:48,054
that's not Twitter, you can find Scott on,

1224
00:42:48,054 --> 00:42:49,494
except that I can never find you on

1225
00:42:49,494 --> 00:42:51,655
Blue Sky because you chose a weird handle

1226
00:42:51,655 --> 00:42:53,655
that isn't the same as any of your

1227
00:42:53,655 --> 00:42:55,894
other social media. You need to go grab

1228
00:42:55,894 --> 00:42:57,494
a new handle on Blue Sky that matches

1229
00:42:57,494 --> 00:42:59,789
everything else. I would say Blue Sky is

1230
00:42:59,789 --> 00:43:01,650
probably where I'm the most active

1231
00:43:02,269 --> 00:43:04,190
as of late and where I feel like

1232
00:43:04,190 --> 00:43:04,849
the biggest

1233
00:43:05,309 --> 00:43:06,849
tech community has

1234
00:43:07,309 --> 00:43:09,630
moved to. So go chat with us on

1235
00:43:09,630 --> 00:43:11,789
Blue Sky. LinkedIn is another good one. I'm

1236
00:43:11,789 --> 00:43:12,769
always on LinkedIn.

1237
00:43:13,085 --> 00:43:15,005
So if you wanna chat, give us feedback

1238
00:43:15,005 --> 00:43:15,744
on LinkedIn,

1239
00:43:16,045 --> 00:43:17,484
you can do that. If you wanna sign

1240
00:43:17,484 --> 00:43:19,085
up for membership, we still have our membership

1241
00:43:19,085 --> 00:43:21,744
at mscloud, I t pro Com / membership.

1242
00:43:22,204 --> 00:43:23,264
Todd's in

1243
00:43:23,565 --> 00:43:25,644
Discord today. He got a new laptop that

1244
00:43:25,644 --> 00:43:27,424
he's gonna go try to run some LLMs

1245
00:43:27,484 --> 00:43:29,460
on. So if you wanna join us, chat

1246
00:43:29,460 --> 00:43:31,380
with us during the recording. You can go

1247
00:43:31,380 --> 00:43:32,359
check out our membership

1248
00:43:32,659 --> 00:43:33,159
options

1249
00:43:33,940 --> 00:43:36,179
there as well and join us in Discord

1250
00:43:36,179 --> 00:43:37,480
for these. So

1251
00:43:37,940 --> 00:43:40,260
looking forward to hearing from people how you

1252
00:43:40,260 --> 00:43:42,359
use LLMs, what you're gonna do with LLMs,

1253
00:43:42,885 --> 00:43:45,844
and how they run locally. Who can bury

1254
00:43:45,844 --> 00:43:47,784
their computer first and

1255
00:43:48,405 --> 00:43:50,744
crash it? Super easy to do. Yeah.

1256
00:43:51,204 --> 00:43:53,364
Anything else? I think that's it. As always,

1257
00:43:53,364 --> 00:43:55,710
thanks, Ben. Alright. Thank you, Scott. We will

1258
00:43:55,710 --> 00:43:56,690
talk to you later.

1259
00:43:58,670 --> 00:44:00,909
If you enjoyed the podcast, go leave us

1260
00:44:00,909 --> 00:44:03,150
a five star rating in iTunes. It helps

1261
00:44:03,150 --> 00:44:04,829
to get the word out so more IT

1262
00:44:04,829 --> 00:44:06,989
pros can learn about Office three sixty five

1263
00:44:06,989 --> 00:44:07,650
and Azure.

1264
00:44:08,190 --> 00:44:09,855
If you have any questions you want us

1265
00:44:09,855 --> 00:44:12,014
to address on the show, or feedback about

1266
00:44:12,014 --> 00:44:14,414
the show, feel free to reach out via

1267
00:44:14,414 --> 00:44:16,514
our website, Twitter, or Facebook.

1268
00:44:16,815 --> 00:44:18,735
Thanks again for listening, and have a great

1269
00:44:18,735 --> 00:44:19,235
day.