Re: [GSOC 18] Discussion on the datatable

From: Mark Wong <mark(at)2ndQuadrant(dot)com>
To: Hongyuan Ma <cs_maleicacid(at)163(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GSOC 18] Discussion on the datatable
Date: 2018-03-07 16:10:57
Message-ID: 20180307161057.GA23953@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Hongyuan,

On Tue, Mar 06, 2018 at 01:36:23AM +0800, Hongyuan Ma wrote:
> Hi Mark,
> In the past few days I have read some code in pgperffarm.git repository.I look forward to discussing the project in detail with you and gradually defining the datasheet structure and refining the requirements. Here are some of my ideas, if there are any errors or deficiencies, please feel free to correct me.
>
>
> To create a datasheet: pg_perf_cate
> Overview:
> pg_perf_cate table is used to store performance test project categories that support multi-level categories.
>
>
> Description:
> In line 12 of the "pgperffarm \ client \ benchmarks \ runner.py" file there is a line like this:
>
>
> ''
> 'manages runs of all the benchmarks, including cluster restarts etc.'
> ''
>
>
> In my imagination, there may be more items of performance tests than build tests. Based on the above comments, I guess, for example, may be there are "single instance of performance tests","cluster performance tests", "other performance tests" three major categories. Each major category also contains their own test sub-categories, such as addition tests and deletion tests and so on. In the pg_perf_cate table, the cate_pid field indicates the parent category of the current test category. If the pid field of a row of data has a value of 0, the row represents the top-level category.
>
>
> Related Functions:
> - Maybe in the navigation bar we can create a category menu to help users quickly find their own interest in the test items (similar to the Amazon Mall product categories). The cate_order field is used to manage the order of the categories in the current level for easy front-end menu display.
> - In the admin back-end need a page which can add or modify categories.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> To create a datasheet: pg_perf_test
> Overview:
> The pg_perf_test table is used to store specific test items, including the test item number(test_id), the name of the test item(test_name), the ID of the sub-category(cate_id), the description of the test item (test_desc,to be discussed), and the person ID(user_id).
>
>
> Description:
> In line 15 of the "pgperffarm \ client \ benchmarks \ pgbench.py" file, I see a note like this:
> ''
> # TODO allow running custom scripts, not just the default
> ''
> Now that I want to allow users to run custom test scripts and upload them, I think it is necessary to tell others how to run the test again. So I want to add a test_desc field that will store the details about this test and tell the user how to run this test. But I am not too sure about the storage format for the details of the test, perhaps the rich text format or markdown format is a suitable choice.
> When this test item is added by the administrator, the user_id field has a value of 0. Otherwise, this field corresponds to the user_id field in the user table. For this field, I prefer not to use foreign keys.
>
>
> Related Functions:
> - At the front end, each test has its own detail page, on which the test related content is presented and a list of test results is listed.
> - In the admin background need a page to manage test items.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> To create a datasheet: pg_perf_test_result
>
>
> Overview:
> The pg_perf_test_result table is used to store test results, including at least the result ID(result_id), user ID (user_id,I prefer not to create a user-test result association table), test item ID(test_id), test branch number(branch_id), system configuration(os_config), pg configuration(pg_config), test result details(result_desc) , test time(add_time) and other fields.
> Confusion:
> I think compared to other tables, pg_perf_test_result table may be a relatively complex one.
> This piece of code can be seen around line 110 of the "pgperffarm \ client \ benchmarks \ runner.py" file:
> ''
> r ['meta'] = {
> 'benchmark': config ['benchmark'],
> 'date': strftime ("% Y-% m-% d% H:% M:% S.000000 + 00", gmtime ()),
> 'name': config_name,
> 'uname': uname,
> }
>
>
> with open ('% s / results.json'% self._output, 'w') as f:
> f.write (json.dumps (r, indent = 4))
> ''
> Could you please provide a sample results.json file so that I can better understand what information is contained in the uploaded data and design the datasheet based on it.

Don't let this distract you too much from finishing your current term.
There will be plenty of time to hammer out the schema.

Here's a brief description of the data that is summarized in a json
object:

The idea is that the json document represents a series of tests
against one particular commit. Then detail of each individual test
run.

{
"pgbench": {
"ro": {
"results at a scale factor": {
"number of clients": {
"std": 4389.727756305762,
"metric": 41390.590287,
"median": 43137.716637,
"results": [
{
... pgbench results of a test ...
}
...
]
},
....
},
"rw": {
... read write test results ...
},
... other tests results
}

> Related Function(s):
> - At the front end, when the user clicks the test result item in the list, it jumps to the test result details page, which show the test result information in detail.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> To create a datasheet: pg_perf_test_branch
>
>
> Overview:
> The pg_perf_test_branch table is used to store the branch version, including the branch ID(branch_id) and branch name(branch_name) fields.
> Related Function(s):
> At the front end and back end, provides the ability to search test results by branch version.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> To create a data table: pg_perf_user
> Overview:
> pg_perf_user table is used to register the user ID(user_id), user's name, (user_name)user's email address(user_email), the user's unique key (to be discussed, used to generate user certificates), registration time(register_time), the last login ip(last_ip) fields.
> Confusion:
> I tried to fill in the information on the registration page of the build farm project and click the submit button, but unfortunately the page shows "500 - Internal Server Error". So I can only try to design the user table by imagination. If you've ever used buildfarm's registration feature, please tell me more details about it.
>
>
> Related Functions
> - Possibly: The user sends a confirmation email and a certificate file to the user's mailbox after registering.
> - Provid the ability to search test results by username.
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> In the last email, you mentioned Django Rest Framework. I very much agree to use it, because it can be very good to help the site to achieve the separation of the front and back ends. At the same time I very much hope to use Vue as a front-end development framework. Vue provides the concept of component is relatively easy to use. Vue is widely used in many projects (including NASA), so I think vue is trustworthy.
>
>
> My postgresql.org community account is:
> Username: maleicacid
> Email: cs_maleicacid(at)163(dot)com
>
>
> I hope to get permission to the pgperffarm.git repository. (In fact, I think it would be better to pull request than to submit code directly, because my code may need to be reviewed.) I would like to create the basic model class.

You can fork the repository somewhere public to allow us to pull and
review from. That's how we're likely going to work with whoever is
accepted.

Regards,
Mark

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2018-03-07 16:22:10 Re: public schema default ACL
Previous Message David Steele 2018-03-07 15:56:32 Re: PATCH: Configurable file mode mask