Mailing List Archive: [PATCH] New: AMD K6 write allocate

[PATCH] New: AMD K6 write allocate

Jan 17, 1999, 9:31 AM

Post #1 of 15 (946 views)

This message is in MIME format
--_=XFMail.1.3.p0.Linux:990117193159:323=_
Content-Type: text/plain; charset=KOI8-R
Hello.
> Do not forget to substitute your RAM size in
>#define WRITE_ALLOCATE_LIMIT
Thanks to Alan Cox's hint - there is no need for manual tuning now.
Attached [new] version of patch (against 2.2.0pre7) should enable write
allocate for AMD K6 model 6/7 processors. On my system it gives 5-7%
increase in speed.
--
With best wishes,
Alexey Vyskubov.
This is a message. Or something silly like that.
--_=XFMail.1.3.p0.Linux:990117193159:323=_
Content-Disposition: attachment; filename="k6-patch"
Content-Description: k6-patch
Content-Type: text/plain; charset=KOI8-R; name=k6-patch; SizeOnDisk=2216
Content-Transfer-Encoding: 7bit
--- linux/arch/i386/kernel/setup.c.was Sun Jan 17 12:17:11 1999
+++ linux/arch/i386/kernel/setup.c Sun Jan 17 19:13:26 1999
@@ -601,6 +601,31 @@
static char *cpu_vendor_names[] __initdata = {
"Intel", "Cyrix", "AMD", "UMC", "NexGen", "Centaur" };

+#define AMD_WHCR 0xc0000082 /* AMD MSR WHCR */
+
+int enable_write_allocate (int size)
+{
+ u32 ra, rd;
+ rd = 0;
+
+ ra = size >> 2; /* EAX = 2*(RAM/4) */
+ ra = (ra << 1) + 1; /* +1 for no memory hole at 15-16MB */
+
+
+#define walloc(msr, val1, val2) __asm__ __volatile__ (" pushf
+ cli
+ wbinvd
+ wrmsr
+ popf"\
+ : /* no outputs */\
+ : "c" (msr),\
+ "a" (val1), "d" (val2))
+ walloc(AMD_WHCR, ra, rd);
+ rdmsr(AMD_WHCR, ra, rd); /* Let's see what's happened */
+ ra = ra & 0xfe; /* Drop last bit (memory hole) */
+ ra = ra << 1;
+ return((int)ra);
+}

__initfunc(void print_cpu_info(struct cpuinfo_x86 *c))
{
@@ -621,8 +646,54 @@

if (c->x86_mask || c->cpuid_level>=0)
printk(" stepping %02x", c->x86_mask);
-
- if(c->x86_vendor == X86_VENDOR_CENTAUR)
+
+ if ( (c->x86_vendor == X86_VENDOR_AMD) &&
+ ((c->x86_model == 6) || (c->x86_model == 7)))
+ {
+
+ u32 ra, rd;
+ u32 ram;
+ int mem;
+
+
+ printk("\nAMD K6 processor, model %d found.", c->x86_model);
+ printk("\nChecking write allocate state... ");
+
+ ram = max_mapnr * (PAGE_SIZE >> 10); /* RAM size in Kb */
+ ram = ram >> 10; /* RAM size in Mb */
+
+ rdmsr(AMD_WHCR, ra, rd);
+ mem = (ra & 0xfe) << 1;
+
+ if (mem) {
+ printk("Write allocate is enabled for %d MB of memory", mem);
+ } else {
+ printk("Write allocate is disabled.");
+ }
+
+ if (!(mem == ram)) {
+ printk("\nTrying to enable write alocate for %d MB of memory", ram);
+ mem = ram;
+ mem = enable_write_allocate(mem);
+
+ if (!(mem == ram)) {
+ printk("\nFailed!!!");
+ }
+ printk("\nWrite allocate is enabled for %d MB of memory", mem);
+ }
+
+ rdmsr(AMD_WHCR, ra, rd);
+
+ if (ra & 0x100) {
+ printk("\nWARNING: WCDE bit is 1.");
+ }
+
+ if (!(ra & 0x1)) {
+ printk("\nWARNING: Memory hole at 15-16M is enabled.");
+ }
+ }
+
+ if (c->x86_vendor == X86_VENDOR_CENTAUR)
{
u32 hv,lv;
rdmsr(0x107, lv, hv);
--_=XFMail.1.3.p0.Linux:990117193159:323=_--
End of MIME message
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] New: AMD K6 write allocate [ In reply to ]

rreilova at ececs

Jan 17, 1999, 12:48 PM

Post #2 of 15 (945 views)

Permalink

Hi,
You patch is OK. Just two minor comments:
1) make the enable_write_allocate an _initfunc so that the code is
discarded after boot and doesn't bloat other people's kernels (who may not
have an AMD).
2) Move the code to the amd_model function. Although currently that
function is misnamed, since it's used for the Cyrix MediaGX detection too.
Probably separate the extended cpuid part, from the real AMD specific
part. That will save some branch code (for what that may be worth).
Cheers,
Rafael
On Sun, 17 Jan 1999, Alexey Vyskubov wrote:
> Hello.
>
> > Do not forget to substitute your RAM size in
> >#define WRITE_ALLOCATE_LIMIT
>
> Thanks to Alan Cox's hint - there is no need for manual tuning now.
>
> Attached [new] version of patch (against 2.2.0pre7) should enable write
> allocate for AMD K6 model 6/7 processors. On my system it gives 5-7%
> increase in speed.
>
> --
> With best wishes,
> Alexey Vyskubov.
> This is a message. Or something silly like that.
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] New: AMD K6 write allocate [ In reply to ]

alexey at alv

Jan 17, 1999, 1:18 PM

Post #3 of 15 (943 views)

Permalink

This message is in MIME format
--_=XFMail.1.3.p0.Linux:990117231852:516=_
Content-Type: text/plain; charset=KOI8-R
> You patch is OK. Just two minor comments:
>
> 1) make the enable_write_allocate an _initfunc so that the code is
> discarded after boot and doesn't bloat other people's kernels (who may
> not have an AMD).
I did it;
and I did some more changes:
* Cosmetic cleanup (thanks to Philipp Rumpf)
* fix nasty bug - enabling write allocate failed if RAM size was 128MB or
greater
> 2) Move the code to the amd_model function. Although currently that
> function is misnamed, since it's used for the Cyrix MediaGX detection
> too.
> Probably separate the extended cpuid part, from the real AMD specific
> part. That will save some branch code (for what that may be worth).
I didn't do it. I should look and think, how to do in the best way.

--
With best wishes,
Alexey Vyskubov.
This is a message. Or something silly like that.
--_=XFMail.1.3.p0.Linux:990117231852:516=_
Content-Disposition: attachment; filename="k6-patch"
Content-Description: k6-patch
Content-Type: text/plain; charset=KOI8-R; name=k6-patch; SizeOnDisk=2168
Content-Transfer-Encoding: 7bit
--- linux/arch/i386/kernel/setup.c.was Sun Jan 17 12:17:11 1999
+++ linux/arch/i386/kernel/setup.c Sun Jan 17 23:14:26 1999
@@ -601,6 +601,30 @@
static char *cpu_vendor_names[] __initdata = {
"Intel", "Cyrix", "AMD", "UMC", "NexGen", "Centaur" };

+#define AMD_WHCR 0xc0000082 /* AMD MSR WHCR */
+
+__initfunc(u32 enable_write_allocate (u32 size))
+{
+ u32 ra, rd;
+ rd = 0;
+
+ ra = size >> 2; /* EAX = 2*(RAM/4) */
+ ra = (ra << 1) | 1; /* +1 for no memory hole at 15-16MB */
+
+
+#define walloc(msr, val1, val2) __asm__ __volatile__ (" pushf
+ cli
+ wbinvd
+ wrmsr
+ popf"\
+ : /* no outputs */\
+ : "c" (msr),\
+ "a" (val1), "d" (val2))
+ walloc(AMD_WHCR, ra, rd);
+ rdmsr(AMD_WHCR, ra, rd); /* Let's see what's happened */
+ ra = (ra & 0xfe) << 1; /* Drop last bit (memory hole) */
+ return(ra);
+}

__initfunc(void print_cpu_info(struct cpuinfo_x86 *c))
{
@@ -621,8 +645,51 @@

if (c->x86_mask || c->cpuid_level>=0)
printk(" stepping %02x", c->x86_mask);
-
- if(c->x86_vendor == X86_VENDOR_CENTAUR)
+
+ if ( (c->x86_vendor == X86_VENDOR_AMD) &&
+ ((c->x86_model == 6) || (c->x86_model == 7)))
+ {
+
+ u32 ra, rd;
+ u32 ram;
+
+ printk("\nAMD K6 processor, model %d found.", c->x86_model);
+ printk("\nChecking write allocate state... ");
+
+ ram = max_mapnr * (PAGE_SIZE >> 10); /* RAM size in Kb */
+ ram >>= 10; /* RAM size in Mb */
+
+ rdmsr(AMD_WHCR, ra, rd);
+ ra = (ra & 0xfe) << 1;
+
+ if (ra) {
+ printk("Write allocate is enabled for %d MB of memory", ra);
+ } else {
+ printk("Write allocate is disabled.");
+ }
+
+ if (!(ra == ram)) {
+ printk("\nTrying to enable write alocate for %d MB of memory", ram);
+ ra = enable_write_allocate(ram);
+
+ if (!(ra == ram)) {
+ printk("\nFailed!!!");
+ }
+ printk("\nWrite allocate is enabled for %d MB of memory", ra);
+ }
+
+ rdmsr(AMD_WHCR, ra, rd);
+
+ if (ra & 0x100) {
+ printk("\nWARNING: WCDE bit is 1.");
+ }
+
+ if (!(ra & 0x1)) {
+ printk("\nWARNING: Memory hole at 15-16M is enabled.");
+ }
+ }
+
+ if (c->x86_vendor == X86_VENDOR_CENTAUR)
{
u32 hv,lv;
rdmsr(0x107, lv, hv);
--_=XFMail.1.3.p0.Linux:990117231852:516=_--
End of MIME message
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] New: AMD K6 write allocate [ In reply to ]

rjanssen at ns

Jan 18, 1999, 2:36 AM

Post #4 of 15 (946 views)

Permalink

At 07:31 PM 1/17/99 +0300, Alexey Vyskubov wrote:
>Hello.
>
>> Do not forget to substitute your RAM size in
>>#define WRITE_ALLOCATE_LIMIT
>
>Thanks to Alan Cox's hint - there is no need for manual tuning now.
>
>Attached [new] version of patch (against 2.2.0pre7) should enable write
>allocate for AMD K6 model 6/7 processors. On my system it gives 5-7%
>increase in speed.
>
Could you show some lmbench results for this patch? I checked a similar
patch (loadable module) , but lmbench didnt notice much improvement. Is it
really that much ?
Thanks,
René
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] New: AMD K6 write allocate [ In reply to ]

alexey at alv

Jan 18, 1999, 6:00 AM

Post #5 of 15 (945 views)

Permalink

Hello.
>>Attached [new] version of patch (against 2.2.0pre7) should enable write
>>allocate for AMD K6 model 6/7 processors. On my system it gives 5-7%
>>increase in speed.
>>
>
> Could you show some lmbench results for this patch? I checked a similar
> patch (loadable module) , but lmbench didnt notice much improvement. Is
> it really that much ?
Maybe your BIOS enables write allocate by itself? Mine doesn't.
Well, never heard about this nice program before. Well, I got it. Results
follow. alv is 2.2.0pre7. alv-a is 2.2.0pre7 with K6 write allocate for
all my RAM (48MB).
-------------- begin: k6-patch-results --------------
L M B E N C H 1 . 9 S U M M A R Y
------------------------------------
(Alpha software, do not distribute)
Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host OS Mhz null null open selct sig sig fork exec
sh
call I/O stat clos inst hndl proc proc
proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ----
----
K6-alv Linux 2.2.0-p 233 0.9 1.4 7 10 0.07K 2.5 5 1.2K 5K
33K
K6-alv-a Linux 2.2.0-p 233 0.9 1.4 6 9 0.07K 2.5 5 0.9K 4K
29K
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
K6-alv Linux 2.2.0-p 5 30 348 111 604 147 688
K6-alv-a Linux 2.2.0-p 5 27 299 94 565 129 583
*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
K6-alv Linux 2.2.0-p 5 20 63 65 95 359
K6-alv-a Linux 2.2.0-p 5 20 47 66 100 361
File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page

Create Delete Create Delete Latency Fault Fault
--------- ------------- ------ ------ ------ ------ ------- ----- -----
K6-alv Linux 2.2.0-p 21 3 98 7 22240 2 9.6K
K6-alv-a Linux 2.2.0-p 22 3 107 5 17653 3 2.3K
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem
Mem
UNIX reread reread (libc) (hand) read
write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ----
-----
K6-alv Linux 2.2.0-p 30 17 16 27 95 33 33 95
53
K6-alv-a Linux 2.2.0-p 56 32 20 36 99 35 35 99
58
Memory latencies in nanoseconds - smaller is better
(WARNING - may not be correct, check graphs)
---------------------------------------------------
Host OS Mhz L1 $ L2 $ Main mem Guesses
--------- ------------- --- ---- ---- -------- -------
K6-alv Linux 2.2.0-p 233 8 207 340
K6-alv-a Linux 2.2.0-p 233 8 198 325
--------------- end: k6-patch-results ---------------
--
With best wishes,
Alexey Vyskubov.
This is a message. Or something silly like that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] New: AMD K6 write allocate [ In reply to ]

alexey at alv

Jan 18, 1999, 9:04 AM

Post #6 of 15 (945 views)

Permalink

Hello!
> Could you show some lmbench results for this patch? I checked a similar
> patch (loadable module) , but lmbench didnt notice much improvement. Is
> it really that much ?
Some more testing: x11perf:
*** W/o K6 write allocate patch: ***
x11perf - X11 performance program, version 1.5
The XFree86 Project, Inc server version 3330 on :0.0
from alv
Mon Jan 18 18:57:54 1999
Sync time adjustment is 0.2409 msecs.
2000000 reps @ 0.0027 msec (364000.0/sec): 10x10 rectangle
2000000 reps @ 0.0027 msec (366000.0/sec): 10x10 rectangle
2000000 reps @ 0.0027 msec (365000.0/sec): 10x10 rectangle
2000000 reps @ 0.0027 msec (366000.0/sec): 10x10 rectangle
2000000 reps @ 0.0027 msec (366000.0/sec): 10x10 rectangle
10000000 trep @ 0.0027 msec (365000.0/sec): 10x10 rectangle
x11perf - X11 performance program, version 1.5
The XFree86 Project, Inc server version 3330 on :0.0
from alv
Mon Jan 18 18:57:11 1999
Sync time adjustment is 0.2436 msecs.
6000000 reps @ 0.0010 msec (1030000.0/sec): 1x1 rectangle
6000000 reps @ 0.0010 msec (1000000.0/sec): 1x1 rectangle
6000000 reps @ 0.0010 msec (1020000.0/sec): 1x1 rectangle
6000000 reps @ 0.0010 msec (1000000.0/sec): 1x1 rectangle
6000000 reps @ 0.0010 msec (1020000.0/sec): 1x1 rectangle
30000000 trep @ 0.0010 msec (1020000.0/sec): 1x1 rectangle
*** W/ K6 write allocate patch: ***
x11perf - X11 performance program, version 1.5
The XFree86 Project, Inc server version 3330 on :0.0
from alv
Mon Jan 18 18:47:12 1999
Sync time adjustment is 0.2273 msecs.
2000000 reps @ 0.0026 msec (387000.0/sec): 10x10 rectangle
2000000 reps @ 0.0026 msec (386000.0/sec): 10x10 rectangle
2000000 reps @ 0.0026 msec (386000.0/sec): 10x10 rectangle
2000000 reps @ 0.0026 msec (386000.0/sec): 10x10 rectangle
2000000 reps @ 0.0026 msec (386000.0/sec): 10x10 rectangle
10000000 trep @ 0.0026 msec (386000.0/sec): 10x10 rectangle
x11perf - X11 performance program, version 1.5
The XFree86 Project, Inc server version 3330 on :0.0
from alv
Mon Jan 18 18:47:58 1999
Sync time adjustment is 0.2292 msecs.
6000000 reps @ 0.0009 msec (1140000.0/sec): 1x1 rectangle
6000000 reps @ 0.0009 msec (1140000.0/sec): 1x1 rectangle
6000000 reps @ 0.0009 msec (1130000.0/sec): 1x1 rectangle
6000000 reps @ 0.0009 msec (1150000.0/sec): 1x1 rectangle
6000000 reps @ 0.0009 msec (1130000.0/sec): 1x1 rectangle
30000000 trep @ 0.0009 msec (1140000.0/sec): 1x1 rectangle
So for 10x10 rec increase in speed is
(386000-365000)*100/365000 = 5.75%
for 1x1 rec
(1140000 - 1020000)*100/1020000 = 11.76%
--
With best wishes,
Alexey Vyskubov.
This is a message. Or something silly like that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] New: AMD K6 write allocate [ In reply to ]

cel at monkey

Jan 18, 1999, 1:51 PM

Post #7 of 15 (945 views)

Permalink

On Mon, 18 Jan 1999, Alexey Vyskubov wrote:
> > Could you show some lmbench results for this patch? I checked a similar
> > patch (loadable module) , but lmbench didnt notice much improvement. Is
> > it really that much ?
>
> Maybe your BIOS enables write allocate by itself? Mine doesn't.
>
> Well, never heard about this nice program before. Well, I got it. Results
> follow. alv is 2.2.0pre7. alv-a is 2.2.0pre7 with K6 write allocate for
> all my RAM (48MB).
alexey-
i've installed your k6-patch on a 2.2.0-pre7-ac6 kernel, running on a
233Mhz AMD-K6 model 6 with 64M (Via VP3 chipset, Award 4.51PG BIOS).
here's dmesg output:
Memory: 62684k/65528k available (1384k kernel code, 412k reserved, 988k
data, 60k init)
CPU: AMD AMD-K6tm w/ multimedia extensions stepping 02
AMD K6 processor, model 6 found.
Checking write allocate state... Write allocate is enabled for 64
MB of memory
Trying to enable write alocate for 63 MB of memory
Failed!!!
Write allocate is enabled for 60 MB of memory
Checking 386/387 coupling... OK, FPU using exception 16 error
reporting.
Checking 'hlt' instruction... OK.
apparently the BIOS automatically enables write allocate. however, is it
correct to reset then re-enable it if the BIOS has already set it up?
- Chuck Lever
--
corporate: <chuckl@netscape.com>
personal: <chucklever@netscape.net> or <cel@monkey.org>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] New: AMD K6 write allocate [ In reply to ]

alexey at alv

Jan 18, 1999, 1:58 PM

Post #8 of 15 (943 views)

Permalink

Hello!
On 18-Jan-99 Chuck Lever wrote:
>
> i've installed your k6-patch on a 2.2.0-pre7-ac6 kernel, running on a
> 233Mhz AMD-K6 model 6 with 64M (Via VP3 chipset, Award 4.51PG BIOS).
>
> here's dmesg output:
>
> Memory: 62684k/65528k available (1384k kernel code, 412k reserved, 988k
> data, 60k init)
> CPU: AMD AMD-K6tm w/ multimedia extensions stepping 02
> AMD K6 processor, model 6 found.
> Checking write allocate state... Write allocate is enabled for 64
> MB of memory
> Trying to enable write alocate for 63 MB of memory
> Failed!!!
> Write allocate is enabled for 60 MB of memory
> Checking 386/387 coupling... OK, FPU using exception 16 error
> reporting.
> Checking 'hlt' instruction... OK.
>
> apparently the BIOS automatically enables write allocate. however, is
> it
> correct to reset then re-enable it if the BIOS has already set it up?
Well, looks like I know where is the problem. You have 64 MB of RAM, and
shadow RAM is enabled in BIOS setup. So Linux kernel sees just 65528k of
memory (which is < 64Mb). When my patch tris to determine memory size in
Mb, it gets 63. Because write allocate is enabled for 64 != 63 Mb memory,
it tries to re-enable write allocate. Write allocate can be enabled only in
4Mb chunks, so it can enable write allocate only for 60Mb of memory.
I can fix this, but I have two options:
(1) Do not re-enable write allocate if it's already enabled for memory
size >= memory, available for kernel (= now) - maybe with the warning
(2) Try to determine physical memory size and re-enable write allocate if
and only if it's not enabled for this memory size - maybe with warning
about shadow memory.
What to do?
P.S. Turning shadow memory off probably will fix the problem. (Write
allocate should not be re-enabled int his case. You'll see just first
message: "Write allocate is enabled for 64 MB of memory".) Could you try
this for me, please?
--
With best wishes,
Alexey Vyskubov.
This is a message. Or something silly like that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] New: AMD K6 write allocate [ In reply to ]

cel at monkey

Jan 18, 1999, 11:15 PM

Post #9 of 15 (948 views)

Permalink

On Mon, 18 Jan 1999, Alexey Vyskubov wrote:
> Date: Mon, 18 Jan 1999 23:58:23 +0300 (MSK)
> From: Alexey Vyskubov <alexey@alv.stud.pu.ru>
> To: Chuck Lever <cel@monkey.org>
> Cc: linux-kernel@vger.rutgers.edu
> Subject: Re: [PATCH] New: AMD K6 write allocate
>
> Well, looks like I know where is the problem. You have 64 MB of RAM, and
> shadow RAM is enabled in BIOS setup. So Linux kernel sees just 65528k of
> memory (which is < 64Mb). When my patch tris to determine memory size in
> Mb, it gets 63. Because write allocate is enabled for 64 != 63 Mb memory,
> it tries to re-enable write allocate. Write allocate can be enabled only in
> 4Mb chunks, so it can enable write allocate only for 60Mb of memory.
>
> I can fix this, but I have two options:
>
> (1) Do not re-enable write allocate if it's already enabled for memory
> size >= memory, available for kernel (= now) - maybe with the warning
>
> (2) Try to determine physical memory size and re-enable write allocate if
> and only if it's not enabled for this memory size - maybe with warning
> about shadow memory.
>
> What to do?
perhaps your logic could check if write allocate is enabled at all. if
it's not already enabled, only then try to enable it. that would allow
Linux to use the BIOS's default setting on hardware that automatically
enables write allocate.
> P.S. Turning shadow memory off probably will fix the problem. (Write
> allocate should not be re-enabled int his case. You'll see just first
> message: "Write allocate is enabled for 64 MB of memory".) Could you try
> this for me, please?
yes, i've disabled everything marked "shadow", and turned off VGA and
System BIOS caching. no change -- it still re-enables only 60M. (i never
noticed that missing 8k before :)
- Chuck Lever
--
corporate: <chuckl@netscape.com>
personal: <chucklever@netscape.net> or <cel@monkey.org>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] New: AMD K6 write allocate [ In reply to ]

alexey at alv

Jan 18, 1999, 11:34 PM

Post #10 of 15 (945 views)

Permalink

> perhaps your logic could check if write allocate is enabled at all. if
> it's not already enabled, only then try to enable it. that would allow
> Linux to use the BIOS's default setting on hardware that automatically
> enables write allocate.
>
My logic did it. But it detects only 63 MB (because that missing 8 Kb) on
your system, and detects write allocate for 64Mb - that's not 63 and it
tries to re-enable write allocate for 63 :)
Well, I'll fix this problem with missing memory in a couple of days.
--
With best wishes,
Alexey Vyskubov.
This is a message. Or something silly like that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux and physical memory [ In reply to ]

jj at sunsite

Jan 19, 1999, 12:37 AM

Post #11 of 15 (946 views)

Permalink

> Jakub, you forget one important thing, and something we have been
> doing on sparc64 for some time now (not to support a lot of memory,
> but for performance, to eliminate some TLB thrashing).
>
> Anonymous pages aren't an issue at all if you can directly and
> efficiently load/unload TLB mappings in software like we can on
> sparc64. In fact, for sparc64, it is more efficient to (and we do):
I think I've mentioned that possibility in my post already, it would be
something like mmu_get_scsi_one/mmu_release_scsi_one on sun4d.
There are two major problems I can see. You need to modify page tables on
Intel and then go and flush them, which is kinda slow on x86 (someone
correct me if there is some fast method). Also, if you add PG_highmem, I bet
current page_alloc will suck badly (even the way how is PG_dma done now is
very unefficient). I think in 2.3 we should page_alloc should be rewritten
so that it can cope with multiple different classes of pages and allocate
them quickly. It should have some notion of class requirement or class
recommendation. So you could say I require a DMAble page, or give me a
non-DMAble page prefered, but if you only have DMAble, then give that to me.
It would be a win even for x86 floppy drivers, you wouldn't get out of DMA
so quickly.
Also, when I started thinking about Linux DR, actually just the memory
hotplug I'd definitely like to see in 2.4, I guess we'll need
PG_rellocatable to indicate a page address is stored only in page tables of
some tasks, so that kernel could swap/move them out if necessary.
The whole area of mem_map will be such place in Linux/sparc64, as we
definitely cannot abandon the idea of freeing unused mem_map parts (think
about .5G wasted on some E10k configurations, even if they could have 1G of
RAM (unlikely, but possible)), but if someone inserts a board in, we'll have
to use the mem_map. We could do some indirection in MAP_NR, but I guess we'd
loose performance badly. Current page_alloc cannot scale with multiple
class requirements, so neither PG_highmem, nor PG_rellocatable can be done
at the moment, IMHO.
Cheers,
Jakub
___________________________________________________________________
Jakub Jelinek | jj@sunsite.mff.cuni.cz | http://sunsite.mff.cuni.cz
Administrator of SunSITE Czech Republic, MFF, Charles University
___________________________________________________________________
UltraLinux | http://ultra.linux.cz/ | http://ultra.penguin.cz/
Linux version 2.2.0-pre7 on a sparc64 machine (3958.37 BogoMips)
___________________________________________________________________
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] New: AMD K6 write allocate [ In reply to ]

rjanssen at ns

Jan 19, 1999, 1:23 AM

Post #12 of 15 (944 views)

Permalink

At 04:00 PM 1/18/99 +0300, Alexey Vyskubov wrote:
>Hello.
>
>>>Attached [new] version of patch (against 2.2.0pre7) should enable write
>>>allocate for AMD K6 model 6/7 processors. On my system it gives 5-7%
>>>increase in speed.
>>>
>>
>> Could you show some lmbench results for this patch? I checked a similar
>> patch (loadable module) , but lmbench didnt notice much improvement. Is
>> it really that much ?
>
>Maybe your BIOS enables write allocate by itself? Mine doesn't.
>
Yes, your patch detected that my BIOS (asus tx97xe/AWARD) enabled it already.
René
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] New: AMD K6 write allocate [ In reply to ]

helge.hafting at daldata

Jan 19, 1999, 2:24 AM

Post #13 of 15 (943 views)

Permalink

[...]
> Well, looks like I know where is the problem. You have 64 MB of RAM, and
> shadow RAM is enabled in BIOS setup. So Linux kernel sees just 65528k of
> memory (which is < 64Mb). When my patch tris to determine memory size in
> Mb, it gets 63. Because write allocate is enabled for 64 != 63 Mb memory,
> it tries to re-enable write allocate. Write allocate can be enabled only in
> 4Mb chunks, so it can enable write allocate only for 60Mb of memory.
>
> I can fix this, but I have two options:
>
> (1) Do not re-enable write allocate if it's already enabled for memory
> size >= memory, available for kernel (= now) - maybe with the warning
>
> (2) Try to determine physical memory size and re-enable write allocate if
> and only if it's not enabled for this memory size - maybe with warning
> about shadow memory.
>
> What to do?
Is there anything wrong with enabling write-allocation for
the entire memory range, including the missing 8k? The kernel
won't write anything to the "missing" memory anyway.
Looks like the bios does this already in some cases,
so it doesn't seem too wrong.
And it enables write allocation for the last 3MB.
Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux and physical memory [ In reply to ]

sct at redhat

Jan 21, 1999, 7:25 AM

Post #14 of 15 (943 views)

Permalink

Hi,
On Mon, 18 Jan 1999 20:51:23 -0800, "David S. Miller" <davem@dm.cobaltmicro.com> said:
> [. CC:'d Ingo and Stephen, they might actually implement this, hint
> hint :-) ]
We plan to. :)
> From: Jakub Jelinek <jj@sunsite.ms.mff.cuni.cz>
> Date: Tue, 19 Jan 1999 00:43:33 +0100 (CET)
> You cannot simply make a difference between user and other pages.
> Think about e.g. COW. Which page would it access through user
> address space? The source page or the destination? One would have
> to be mapped in kernel space (Linux COW uses both source and
> destination here within the kernel mapping of all virtual memory).
Yes, but the page does not have to be in the kernel's mapping _all_ of
the time.
The way I'm currently leaning towards is to reserve something like 100MB
of the kernel's virtual address space for dynamic VA mappings on Intel,
hooked on a simple LRU list. For every physical page, add a page->kva
pointer containing that page's current kernel virtual address, and use a
page_kva() macro to find the virtual address for any given struct page.
It is relatively straightforward to recycle the mappings in that 100MB
region to make sure that all pages the kernel is currently using have a
valid kva, and that we don't have to update the va mappings for every
single kernel access.
This works pretty well because the number of times we have to access
user memory via such a mechanism is pretty small. ptrace() is the
obvious one. COW is easy: when we take a COW fault, the existing data
is already in the faulting process's VA, so we just copy it out of the
existing mm context and into the newly allocate page (we may need to
kva-map the new page, of course). Similarly, read/write just need to
kva-map the page cache page and copy to/from the current process's user
address space.
> So roughly for all non-device driver stuff the implementation on Intel
> could be:
> 1) Add a per-page flag PG_highmem
> 2) At boot time set per-page flags in this way based upon what
> where in physical memory the page is and what PAGE_OFFSET the
> kernel is using.
I'd prefer just to fix the PAGE_OFFSET in this model, and specify 1GB
kernel VA at all times. 800MB can be physically mapped, non-highmem
pages; 100MB can give us a decent size of kva mapping; and the remainder
is free for vmalloc. Of course, we'll have to use 4k page tables for
the 100MB kva VA, which kind of sucks, but it's better than remapping
for every single highmem access.
> 3) Add some mechanism to tell __get_free_pages() that PG_highmem
> pages are OK to use.
> 4) Fix do_no_page and do_wp_page code paths to allocate anonymous
> pages allowing the special PG_highmem type.
Add the page cache to that. Page cache is no less or more difficult
than anonymous pages, since in both cases we need to be able to support
(or give the illusion of) direct IO into highmem (think mmap and/or
swapping). For pages above 4G, that illusion will have to be achieved
transparently via bounce buffers in ll_rw_block.
> 5) Implement __copy_high_page() and __clear_high_page() for x86,
> by picking 2 unused virtual addresses in kernel space and making
> temporary mappings on the local processor during the copy/clear
> page operation. This may get tricky because of how swapper_pg_dir
> is shared amongst processors, but actually since the local task
> has it's own page table it should work _iff_ all copy/clear page
> operations happen in such a context during page faults (I think
> they do).
Much better to use the page_kva() style, and make the temporary mapping
persistent in the short to medium term with LRU recycling of kvas. And
yes, notification between CPUs is the tricky part. The best mechanism
may well be architecture-specific. One thought was to have a kva MM
sequence number on each CPU, so that any time we do a global
invalidate() we can increase the local sequence number and assume all
our local tlbs are uptodate; but until that time, we can still detect
kvas which may not be locally uptodate. That will at least allow us to
defer the SMP-global page invalidation until we detect a cross-CPU
conflict, so that the common case is fast: only if several CPUs need to
access the same kva without an intervening context switch do we
invalidate everywhere.
> 1) Some bits of code want to coalesce/uncoalesce buffer head buffers
> to/from pages by copying them one by one into/from the page. I
> haven't checked all the places which do this, so I have no
> suggestion about how to handle this.
The buffer cache itself probably needs to stay in lowmem, because of the
amount of filesystem code which uses it for metadata. Page cache IO
buffers are special temporary buffers, and it is easy enough to lock
them into kva while that IO is in progress: the buffer_heads will never
persist after the IO.
The SMP invalidation is probably the most interesting part of the whole
thing, in that there are lots of ways of doing it and it is not
particularly obvious up front what The Best way is. The rest of the
implementation should not be that complex.
--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Linux and physical memory [ In reply to ]

sct at redhat

Jan 23, 1999, 12:37 PM

Post #15 of 15 (945 views)

Permalink

Hi,
On Fri, 22 Jan 1999 00:11:09 -0800, "David S. Miller"
<davem@dm.cobaltmicro.com> said:
> This is where your whole scheme is dead wrong.
> There should be no tlb invalidation issues, everything should be local
> to the processor and the page tables being used by him. This is the
> whole crux behind my "make the mapping locally and temporarily only at
> the time of the access" scheme.
Danger! Danger!
We cannot rely on this working on an entirely per-CPU basis. Think
about read(2) and reading from the page cache: we have to kva-map the
page and then copy it to user space. The user space access can page
fault. We can be rescheduled on a different CPU.
If (as we'd prefer) the kva page tables are marked as persistant over
context switch (we can do that on P-Pro and later Intel CPUs), then we
have just found ourself able to schedule to another CPU without a
reliable protection on the TLBs.
We have to cater for the same kva being used on multiple CPUs, or else
just pin the process to that CPU while the kva is in use (yuck).
There are obvious ways around this, of course, such as marking the
number of kva-locked pages in each process's task struct, and doing an
invalidate) on the kva range every time we reschedule any process with
non-zero kva pages to a different CPU. That at least will avoid any
extra work in the normal straight-through non-faulting code path.
The other advantage of a common pool of kva addresses is that it should
allow us to have a "cache" of kvas: for example, all commonly used page
cache pages will already have a kva assigned, and we won't need to
perform _any_ tlb invalidation, local or remote, to access them.
> Really, how much does the following cost:
> __cli();
> *pte1 = make_pte(PAGE_KERNEL, MAGIC_VADDR1);
> *pte2 = make_pte(PAGE_KERNEL, MAGIC_VADDR1);
> touch_high_pages_in_some_way()...
> *pte1 = 0; flush_tlb_page(MAGIC_VADDR1);
> *pte2 = 0; flush_tlb_page(MAGIC_VADDR1);
> __sti();
Good question. As I've said, we cannot make this atomic because of the
possibility of taking a page fault in the middle of it, but otherwise
this is what we ideally want to aim for. Just how expensive is a
single-page local tlb flush on the various 32-bit architectures? If it
is cheap enough then using the kva space as a uniform, cross-CPU cache
to avoid tlb flushes isn't really workwhile. If it is expensive then
eliminating invalidations for access to commonly touched pages will be
worth the extra expense involved we need to pass modified kvas between
CPUs.

> 1) Local tlb flushes == 2 * X86_CYCLES_PER_TLB_PAGE_FLUSH
If it's that cheap then yes, do it per CPU and invalidate everything if
we detect one of the processes crossing between CPUs.
--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/