>
PR Center
>
News
[Security News = Dong-Hyeon Lim, Supervisor of Smart City Service Team, Seocho-gu Office] In August 2020, Seocho-gu, the Ministry of Science and ICT and the National Information Society Agency, joined CPRO in the ‘Artificial Intelligence Learning Dataset Construction Project (Secondary)’ free task section. NamuPlanet formed a consortium with the National Safety Competency Association and was selected as the highest score, and the data set is being built until February 2021.

▲ A view of Seocho-gu Office [Photo = Seocho-gu
Office]
The theme
proposed by Seocho-gu is 'building a dataset for measuring congestion in urban
areas of vehicles and people using BirdEye-View', a characteristic of downtown. Seocho-gu
leases the rooftops of 20 landmark buildings in the building, collects
high-performance wireless CCTV images (Bird Eye View) that are installed and
operated, and plans to process them together with a consortium to create a
reference dataset for improving AI engine performance such as counting and
congestion and density.

▲ Screen shot of the result for the AI learning
dataset construction project (2nd) [Photo = Seocho-gu Office]
The
government has defined a job group called labelers as the “Data Dam-New Deal
Project,” including the AI learning dataset construction project, and promised
active support. This project is the first in the case of a data set
construction project in which a local government has become a supervisory
organization, and I would like to summarize the contents of the Seocho-gu
Bird's Eye View data set as follows in order to establish future government
policies and develop the industry.
Start of
construction business, “data acquisition”
Seocho-gu
acquired Bird's Eye View by installing 40 fixed cameras on the roof of a
landmark building to secure the bird's eye view angle, and made it the starting
point of the business. As shown in Pictures 2 through 5, the main targets are
places where actual congestion occurs, such as Sadang Station, Gangnam Station,
Express Terminal, Gangnam-daero, and Gyeongbu Expressway, as the main goal, and
designed to be a meaningful dataset in terms of congestion. To reflect
the various needs of the industry, the image format (photo and video), image
size (2M pixels and 5M pixels), compression method (H.264 and H.265), distance
(far and near), shooting angle (based on ground level) From 30
degrees to 90 degrees), object size (from the size where personal information
cannot be checked to smaller), etc., various angles of view were set and high
quality images were acquired.

▲ Bird's Eye View (Picture 2~5 clockwise from the top
left) [Picture = Seocho-gu Office]
'Data processing'
for personal information protection
The
acquired data may include personally identifiable information. Therefore,
unlike the existing file distribution work method or file download work method,
the basic environment was implemented as VDI (Desktop Virtualization: Virtual
Desktop Infrastructure) to present a fundamental solution to personal
information distribution and personal information infringement. In other words,
the labeler complied with the law by allowing all administrators to monitor and
evaluate quality only in the VDI environment, using the VMware Horizon VDI
client to access, work, and store (not copy) the Seocho-gu Birdeye View
platform. As shown in Figures 6 to 11, the working platform in the virtualized
environment is developed based on open source CVAT, so that the labeler works
intuitively, adds convenience to inspectors over three rounds, and is designed
to secure deep visibility to managers. For specific images and images that
identify vehicle numbers, such as photo 12 to photo 15, only the relevant area
is masked to minimize the damage rate of original and learning data.
▲ Clockwise from the top left, picture 6 after VDI
execution, picture 7 platform screen, picture 8 worker screen, picture 9~11
manager monitoring screen [Photo = Seocho-gu Office]

▲ From left, photo 12, 13 Annotator work result (person) [Photo =
Seocho-gu Office]
▲From left, photos 14 and 15 annotator work results
(vehicle) [Photo = Seocho-gu Office]
‘Providing
data’ that will satisfy user satisfaction
The
Seocho-gu consortium is based on images and metadata (photo 18) labeled for
each 300 hours for people and vehicles, 14,400 hours of bird's eye view
original video, and video compression technology (H.264/H.265) for differential
use to improve AI reliability. 1,200 background images (photo 19) will be
provided as a result. Legal review is underway with actual videos to check
whether personal information can be identified in all videos, and only the
reviewed data is uploaded to aihub.nia.or.kr after inspection through TTA is
completed. The object flow direction, environmental information such as fine
dust data, and distance and angle data between the angle of view and the
camera's distance and angle data measured by a laser meter are included in the
metadata so that users' satisfaction with the use of the dataset can be
improved. For reference, the Seocho-gu consortium is jointly planned by a HW
manufacturer (Cpro) and an artificial intelligence service company (NamuPlanet)
from the perspective of consumers, and is trying to become a 'sufficiently
useful dataset' in the current situation.

▲Photo 16 Open Talk(Kakao Talk) asking an annotator
question (left) and Photo 17 Open Talk standardizing annotator question results
(right) [Photo = Seocho-gu Office]

▲ Photo 18 labeled meta data [Photo = Seocho-gu
Office]
▲Photo 19 Background image for difference recognition [Photo = Seocho-gu
Office]
'Data
utilization' in various areas
The data
set prepared in this way can be used not only as the reference data for the AI
counting engine and the AI service for measuring congestion (dense), but also
as reference data for various empirical services in the ITS area. In addition, since it
was not labeled because we did not think about it at the moment, considering
that new labeling demand may arise at any time, a sufficient amount (14,400
hours = 30 angles of view * 20 days * 24 hours) of high-quality original data
is also available on aihub along with labeling and metadata. Will be provided
through.
'Proof
as a service' through pilot service from January
The
Birdeye View demonstration service selects two birdeye view angles of view
(small) in Seocho-gu and overlays the results of real-time artificial
intelligence congestion analysis on the original stream, and from January to
the web (birdeyeview.seocho.go.kr) and the app (Seocho Smart) City) will operate a
pilot service. In particular, by demonstrating the corona response convergence service,
the result of real-time artificial intelligence congestion analysis will be
linked to media such as SIP broadcasting terminals and electronic signs to
transmit periodic announcements according to the congestion level, thereby
demonstrating the possibility of using a dataset that contributes to national
health.
Suggestions
for improving data set reliability and quality
Since the
metadata required by AI service development companies may vary, it is necessary
to collect a lot of various original data apart from labeling. This means that the
data set construction project needs to be shifted to the perception that it is
a business to purchase original data from a certain point of view, and all
involved must agree on responsibility for the reliability and quality of the
data set.
Where you
plan your business, you should try to create a neutral dataset. Associations
representing the industry (○○○○ Research Association, ○○○ City Association,
○○○○ Technology Association, etc.) should also participate in the planning to
create a wider and more universal standard and broaden the scope of data. In particular, there
should be no failure to participate in the data set construction project or
drop out of the data set construction project because specialized companies
cannot be grouped as in the secondary project.
The
diversity of the members of the participating consortium is also required. The AI ecosystem that
everyone wants when the dataset business should not be concentrated on a few
specialized companies operating a labeler and emerging AI SW companies, but
rather should be a service provider (hardware manufacturers to companies that
apply AI services) so that the artificial intelligence ecosystem that everyone
wants can be created.
I hope
that public institutions will also be interested in and actively participate in
dataset projects. In particular, if high-quality data is required,
the public must participate in the data acquisition and purification process. If the host institution
is a public institution, it will not self-help the decline in data reliability,
so you can think of high-quality reflection benefits.
It also
seems necessary to improve the morality of participating companies. In fact, I hope that
results will not be submitted with learnable data acquired through
inappropriate channels such as YouTube or Chinese metadata purchase, or will
obscure the purpose of the project due to inaccurate labeling and hinder the
development of the AI ecosystem.
For reference, inexpensive video datasets that are already in circulation in
the market or that can be easily purchased in China, even if it is not a legal
issue, result in 'only insignificant amounts', 'not unique', and 'statistical
errors' from the perspective of artificial intelligence. It must be filtered.
Lastly, the job of labeler is a hopeful job that can be given to those who are
marginalized from society and those who are difficult to re-enter from the
perspective of public welfare. I hope that the dataset business will become
more active, and above all, appropriate treatment should be defined and
guaranteed. The Seocho-gu consortium pays about 100,000 won each based on an
expected 8-hour workday and is striving to save the purpose of the business.
[Written by Dong-Hyeon Lim, Supervisor of Smart City Service Team, Seocho-gu
Office]
source : www.boannews.com/html/detail.html?idx=93760&tab_type=1