Mailing List Archive

Re: [Wikitech-l] Abstract Schema and Schema changes, request for help
Thank you so much for working on this, it was one of the most painful
aspects of core development!

It might be worth using a consistent gerrit topic or hashtag to make
finding the relevant patches easy.

On Sat, May 9, 2020 at 3:21 AM Amir Sarabadani <ladsgroup@gmail.com> wrote:

> Hello,
> In case you haven't done any changes on database schema of mediawiki core,
> let me explain the process to you (if you know this, feel free to skip this
> paragraph):
> * Mediawiki core supports three types of RDBMS: MySQL, Sqlite, Postgres. It
> used to be five (plus Oracle and MSSQL)
> * For each one of these types, you need to do three parts: 1- Change the
> tables.sql file so new installations get the new schema 2- Make .sql schema
> change file, like an "ALTER TABLE" for current installations so they can
> upgrade. 3- Wire that schema change file into *Updater.php file.
> * For example, this is a patch to drop a column:
> https://gerrit.wikimedia.org/r/c/mediawiki/core/+/473601 This file touches
> 14 different files, adds 94 lines and removes 30.
>
> This is bad for several reasons:
> * It is extremely complicated to do a even a simple schema change. Usually
> something as simple as adding an column takes a whole day for me. There are
> lots of complicating factors, like Sqlite doesn't have ALTER TABLE, so when
> you want to make a patch for adding a column, you need to make a temporary
> table with the new column, copy the old table data to it, drop the old
> table and then rename the old table.
> ** Imagine the pain and sorrow when you want to normalize a table meaning
> you need to do several schema changes: 1- Add a table, 2- Add a column on
> the old table, 3- make the column not-nullable when it's filled and make
> the old column nullable instead 4- drop the old column.
> * It's almost impossible to test all DBMS types, I don't have MSSQL or
> Oracle installed and I don't even know their differences with MySQL. I
> assume most other developers are good in one type, not all.
> * Writing raw sqls, specially duplicated ones, and doubly specially when we
> don't have CI to test (because we won't install propriety software in our
> infra) is pretty much prone to error. My favourite one was that a new
> column on a table was actually added to the wrong table in MSSQL and it
> went unnoticed for two years (four releases, including one LTS).
> * It's impossible to support more DBMS types through extensions or other
> third party systems. Because the maintainer needs to keep up with all
> patches we add to core and write their equivalents.
> * For lots of reasons, these schemas are diverging, there have been several
> work to just reduce this to a minimum.
>
> There was a RFC to introduce abstract schema and schema changes and it got
> accepted and I have been working to implement this:
> https://phabricator.wikimedia.org/T191231
>
> This is not a small task, and like any big work, it's important to cut it
> to small pieces and gradually improve things. So my plan is first, I
> abstract the schema (tables.sql files), then slowly I abstract schema
> changes. For now, the plan is to make these .sql files automatically
> generated through maintenance scripts. So we will have a file called
> tables.json and when running something like:
> php maintenance/generateSchemaSql.php --json maintenance/tables.json --sql
> maintenance/tables-generated.sql --type=mysql
> It would produce tables-generated.sql file. The code that produces it is
> Doctrine DBAL and this is already installed as a dev dependency of core
> because you would need Doctrine if you want to make a schema change, if you
> maintain an instance, you should not need anything. Most of the work for
> automatically generating schema is already merged and the last part that
> wires it (and migrates two tables) is up for review:
> https://gerrit.wikimedia.org/r/c/mediawiki/core/+/595240
>
> My request is that I need to make lots of patches and since I'm doing this
> in my volunteer capacity, I need developers to review (and potentially help
> with the work if you're excited about this like me). Let me know if you're
> willing to be added in future patches and the current patch also welcomes
> any feedback: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/595240
>
> I have added the documentation in
> https://www.mediawiki.org/wiki/Manual:Schema_changes for the plan and
> future changes. The ideal goal is that when you want to do a schema change,
> you just change tables.json and create a json file that is snapshot of
> before and after table (remember, sqlite doesn't have alter table, meaning
> it has to know the whole table). Also, once we are in a good shape in
> migrating mediawiki core, we can start cleaning up extensions.
>
> Any feedback is also welcome.
>
> Best
> --
> Amir (he/him)
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Re: [Wikitech-l] Abstract Schema and Schema changes, request for help [ In reply to ]
Oh thanks for the great idea. I started "abstract-schema" hashtag and added
them to as many as possible:
https://gerrit.wikimedia.org/r/q/hashtag:%22abstract-schema%22+(status:open%20OR%20status:merged)

I just want to say these three still need reviewing:

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/595311
https://gerrit.wikimedia.org/r/c/mediawiki/core/+/595316
https://gerrit.wikimedia.org/r/c/mediawiki/core/+/595289

Best

On Mon, May 18, 2020 at 1:47 PM Gergo Tisza <gtisza@wikimedia.org> wrote:

> Thank you so much for working on this, it was one of the most painful
> aspects of core development!
>
> It might be worth using a consistent gerrit topic or hashtag to make
> finding the relevant patches easy.
>
> On Sat, May 9, 2020 at 3:21 AM Amir Sarabadani <ladsgroup@gmail.com>
> wrote:
>
> > Hello,
> > In case you haven't done any changes on database schema of mediawiki
> core,
> > let me explain the process to you (if you know this, feel free to skip
> this
> > paragraph):
> > * Mediawiki core supports three types of RDBMS: MySQL, Sqlite, Postgres.
> It
> > used to be five (plus Oracle and MSSQL)
> > * For each one of these types, you need to do three parts: 1- Change the
> > tables.sql file so new installations get the new schema 2- Make .sql
> schema
> > change file, like an "ALTER TABLE" for current installations so they can
> > upgrade. 3- Wire that schema change file into *Updater.php file.
> > * For example, this is a patch to drop a column:
> > https://gerrit.wikimedia.org/r/c/mediawiki/core/+/473601 This file
> touches
> > 14 different files, adds 94 lines and removes 30.
> >
> > This is bad for several reasons:
> > * It is extremely complicated to do a even a simple schema change.
> Usually
> > something as simple as adding an column takes a whole day for me. There
> are
> > lots of complicating factors, like Sqlite doesn't have ALTER TABLE, so
> when
> > you want to make a patch for adding a column, you need to make a
> temporary
> > table with the new column, copy the old table data to it, drop the old
> > table and then rename the old table.
> > ** Imagine the pain and sorrow when you want to normalize a table meaning
> > you need to do several schema changes: 1- Add a table, 2- Add a column on
> > the old table, 3- make the column not-nullable when it's filled and make
> > the old column nullable instead 4- drop the old column.
> > * It's almost impossible to test all DBMS types, I don't have MSSQL or
> > Oracle installed and I don't even know their differences with MySQL. I
> > assume most other developers are good in one type, not all.
> > * Writing raw sqls, specially duplicated ones, and doubly specially when
> we
> > don't have CI to test (because we won't install propriety software in our
> > infra) is pretty much prone to error. My favourite one was that a new
> > column on a table was actually added to the wrong table in MSSQL and it
> > went unnoticed for two years (four releases, including one LTS).
> > * It's impossible to support more DBMS types through extensions or other
> > third party systems. Because the maintainer needs to keep up with all
> > patches we add to core and write their equivalents.
> > * For lots of reasons, these schemas are diverging, there have been
> several
> > work to just reduce this to a minimum.
> >
> > There was a RFC to introduce abstract schema and schema changes and it
> got
> > accepted and I have been working to implement this:
> > https://phabricator.wikimedia.org/T191231
> >
> > This is not a small task, and like any big work, it's important to cut it
> > to small pieces and gradually improve things. So my plan is first, I
> > abstract the schema (tables.sql files), then slowly I abstract schema
> > changes. For now, the plan is to make these .sql files automatically
> > generated through maintenance scripts. So we will have a file called
> > tables.json and when running something like:
> > php maintenance/generateSchemaSql.php --json maintenance/tables.json
> --sql
> > maintenance/tables-generated.sql --type=mysql
> > It would produce tables-generated.sql file. The code that produces it is
> > Doctrine DBAL and this is already installed as a dev dependency of core
> > because you would need Doctrine if you want to make a schema change, if
> you
> > maintain an instance, you should not need anything. Most of the work for
> > automatically generating schema is already merged and the last part that
> > wires it (and migrates two tables) is up for review:
> > https://gerrit.wikimedia.org/r/c/mediawiki/core/+/595240
> >
> > My request is that I need to make lots of patches and since I'm doing
> this
> > in my volunteer capacity, I need developers to review (and potentially
> help
> > with the work if you're excited about this like me). Let me know if
> you're
> > willing to be added in future patches and the current patch also welcomes
> > any feedback: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/595240
> >
> > I have added the documentation in
> > https://www.mediawiki.org/wiki/Manual:Schema_changes for the plan and
> > future changes. The ideal goal is that when you want to do a schema
> change,
> > you just change tables.json and create a json file that is snapshot of
> > before and after table (remember, sqlite doesn't have alter table,
> meaning
> > it has to know the whole table). Also, once we are in a good shape in
> > migrating mediawiki core, we can start cleaning up extensions.
> >
> > Any feedback is also welcome.
> >
> > Best
> > --
> > Amir (he/him)
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



--
Amir (he/him)
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Re: [Wikitech-l] Abstract Schema and Schema changes, request for help [ In reply to ]
Anyone to review these? There are more fifty tables and at this speed,
it'll take a handful of years to migrate them :(

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/595311
https://gerrit.wikimedia.org/r/c/mediawiki/core/+/595316
https://gerrit.wikimedia.org/r/c/mediawiki/core/+/595289

On Tue, May 19, 2020 at 11:40 AM Amir Sarabadani <ladsgroup@gmail.com>
wrote:

> Oh thanks for the great idea. I started "abstract-schema" hashtag and
> added them to as many as possible:
> https://gerrit.wikimedia.org/r/q/hashtag:%22abstract-schema%22+(status:open%20OR%20status:merged)
>
> I just want to say these three still need reviewing:
>
> https://gerrit.wikimedia.org/r/c/mediawiki/core/+/595311
> https://gerrit.wikimedia.org/r/c/mediawiki/core/+/595316
> https://gerrit.wikimedia.org/r/c/mediawiki/core/+/595289
>
> Best
>
> On Mon, May 18, 2020 at 1:47 PM Gergo Tisza <gtisza@wikimedia.org> wrote:
>
>> Thank you so much for working on this, it was one of the most painful
>> aspects of core development!
>>
>> It might be worth using a consistent gerrit topic or hashtag to make
>> finding the relevant patches easy.
>>
>> On Sat, May 9, 2020 at 3:21 AM Amir Sarabadani <ladsgroup@gmail.com>
>> wrote:
>>
>> > Hello,
>> > In case you haven't done any changes on database schema of mediawiki
>> core,
>> > let me explain the process to you (if you know this, feel free to skip
>> this
>> > paragraph):
>> > * Mediawiki core supports three types of RDBMS: MySQL, Sqlite,
>> Postgres. It
>> > used to be five (plus Oracle and MSSQL)
>> > * For each one of these types, you need to do three parts: 1- Change the
>> > tables.sql file so new installations get the new schema 2- Make .sql
>> schema
>> > change file, like an "ALTER TABLE" for current installations so they can
>> > upgrade. 3- Wire that schema change file into *Updater.php file.
>> > * For example, this is a patch to drop a column:
>> > https://gerrit.wikimedia.org/r/c/mediawiki/core/+/473601 This file
>> touches
>> > 14 different files, adds 94 lines and removes 30.
>> >
>> > This is bad for several reasons:
>> > * It is extremely complicated to do a even a simple schema change.
>> Usually
>> > something as simple as adding an column takes a whole day for me. There
>> are
>> > lots of complicating factors, like Sqlite doesn't have ALTER TABLE, so
>> when
>> > you want to make a patch for adding a column, you need to make a
>> temporary
>> > table with the new column, copy the old table data to it, drop the old
>> > table and then rename the old table.
>> > ** Imagine the pain and sorrow when you want to normalize a table
>> meaning
>> > you need to do several schema changes: 1- Add a table, 2- Add a column
>> on
>> > the old table, 3- make the column not-nullable when it's filled and make
>> > the old column nullable instead 4- drop the old column.
>> > * It's almost impossible to test all DBMS types, I don't have MSSQL or
>> > Oracle installed and I don't even know their differences with MySQL. I
>> > assume most other developers are good in one type, not all.
>> > * Writing raw sqls, specially duplicated ones, and doubly specially
>> when we
>> > don't have CI to test (because we won't install propriety software in
>> our
>> > infra) is pretty much prone to error. My favourite one was that a new
>> > column on a table was actually added to the wrong table in MSSQL and it
>> > went unnoticed for two years (four releases, including one LTS).
>> > * It's impossible to support more DBMS types through extensions or other
>> > third party systems. Because the maintainer needs to keep up with all
>> > patches we add to core and write their equivalents.
>> > * For lots of reasons, these schemas are diverging, there have been
>> several
>> > work to just reduce this to a minimum.
>> >
>> > There was a RFC to introduce abstract schema and schema changes and it
>> got
>> > accepted and I have been working to implement this:
>> > https://phabricator.wikimedia.org/T191231
>> >
>> > This is not a small task, and like any big work, it's important to cut
>> it
>> > to small pieces and gradually improve things. So my plan is first, I
>> > abstract the schema (tables.sql files), then slowly I abstract schema
>> > changes. For now, the plan is to make these .sql files automatically
>> > generated through maintenance scripts. So we will have a file called
>> > tables.json and when running something like:
>> > php maintenance/generateSchemaSql.php --json maintenance/tables.json
>> --sql
>> > maintenance/tables-generated.sql --type=mysql
>> > It would produce tables-generated.sql file. The code that produces it is
>> > Doctrine DBAL and this is already installed as a dev dependency of core
>> > because you would need Doctrine if you want to make a schema change, if
>> you
>> > maintain an instance, you should not need anything. Most of the work for
>> > automatically generating schema is already merged and the last part that
>> > wires it (and migrates two tables) is up for review:
>> > https://gerrit.wikimedia.org/r/c/mediawiki/core/+/595240
>> >
>> > My request is that I need to make lots of patches and since I'm doing
>> this
>> > in my volunteer capacity, I need developers to review (and potentially
>> help
>> > with the work if you're excited about this like me). Let me know if
>> you're
>> > willing to be added in future patches and the current patch also
>> welcomes
>> > any feedback: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/595240
>> >
>> > I have added the documentation in
>> > https://www.mediawiki.org/wiki/Manual:Schema_changes for the plan and
>> > future changes. The ideal goal is that when you want to do a schema
>> change,
>> > you just change tables.json and create a json file that is snapshot of
>> > before and after table (remember, sqlite doesn't have alter table,
>> meaning
>> > it has to know the whole table). Also, once we are in a good shape in
>> > migrating mediawiki core, we can start cleaning up extensions.
>> >
>> > Any feedback is also welcome.
>> >
>> > Best
>> > --
>> > Amir (he/him)
>> > _______________________________________________
>> > Wikitech-l mailing list
>> > Wikitech-l@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> Amir (he/him)
>
>

--
Amir (he/him)
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l