Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
4.0k views
in Technique[技术] by (71.8m points)

apache spark - How to get counts for null, not null, distinct values, all rows for all columns in a sql table?

I'm currently looking to get a table that gets counts for null, not null, distinct values, and all rows for all columns in a given table. This happens to be in Databricks (Apache Spark).

Something that looks like what is shown below.

enter image description here

I know I can do this with something like the sql shown below. Also I can use something like Java or Python, etc., to generate the sql.

The question is:

-Is this the most efficient approach

-Is there a better way to write this query (less typing and/or more efficient)

select 
  1 col_position,
  'location_id' col_name,
  count(*) all_records,
  count(location_id) not_null,
  count(*) - count(location_id) null,
  count(distinct location_id) distinct_values
from
  admin
  
union
select 
  2 col_position,
  'location_zip' col_name,
  count(*) all_records,
  count(location_zip) not_null,
  count(*) - count(location_zip) null,
  count(distinct location_zip) distinct_values
from
  admin
  
union
select 
  3 col_position,
  'provider_npi' col_name,
  count(*) all_records,
  count(provider_npi) not_null,
  count(*) - count(provider_npi) null,
  count(distinct provider_npi) distinct_values
from
  admin
  
order by col_position
;

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神解答

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...