- 
                Notifications
    
You must be signed in to change notification settings  - Fork 705
 
SQL for machine learning
        Yi Wang edited this page Oct 19, 2018 
        ·
        3 revisions
      
    Let us start with a very simple case.
Suppose that we want to regress the salary with respect the age and the gender, we'd train a model using the following SQL statement:
SELECT age, gender, salary
FROM   engineer_info, engineer_payment
WHERE  engineer_info.id = engieer_payment.id
TRAIN  DNNRegressor
WITH   hidden_units = [10, 30]
COLUMN clip(age, 18, 65), gender, cross(clip(age, 18, 65), gender)
LABEL  salary
INTO   my_first_model
;This generates a table my_first_model, which encode
- inputs: age, gender
 - columns: clipped age, gender, and the cross of clipped age and gender
 - the label: salary
 
We see that we need both SELECT to specify the fields to retrieve and COLUMN for fields-to-feature mapping.
Given this model, we can infer the salary for any other group of people. For example, the execution of the following statement
SELECT id, age, sex
FROM   another_company_employee_info
INFER  my_first_model
COLUMN age, vocab(sex, ["Female", "Male"])
LABEL  expected_salary
INTO   a_new_tableshould generate a new table a_new_table with fields:
- id, age, sex from SELECT, and
 - expected_salary from LABEL
 
Again, we need COLUMN in addition to SELECT to map different field names, and even field values, to the features acceptable by the model.